Wednesday, November 4, 2009

Unindexed Properties

Did you know that, by default, the App Engine Datastore writes two index records for every entity property that isn't a com.google.appengine.api.datastore.Blob or a com.google.appengine.api.datastore.Text?  It's true!  These index records allow you to execute a variety of queries involving the property without creating a composite index.

Now, these index records don't come for free.  They take time to write and they take up space on disk.  If you have a property that you're absolutely positively sure you'll never want to filter or sort by, you can opt-out of the default indexing that is going on.  Let's look at an example.  We'll use a Book class with a com.google.appengine.api.datastore.Link property that is the URL of an image of the Book's cover.  Our assumption here is that we don't need to select Books based on the value of this property and we don't need to sort them based on the value of this property.


JPA:
@Entity
public class Book {
  @Id
  @GeneratedValue(strategy=GenerationType.IDENTITY)
  private Long id;

  @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
  private Link coverImageUrl;

  private String title;

  // getters and setters
}


JDO:
@PersistenceCapable(identityType=IdentityType.APPLICATION)
public class Book {
  @PrimaryKey
  @Persistent(valueStrategy=IdGeneratorStrategy.IDENTITY)
  private Long id;

  @Persistent
  @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
  private Link coverImageUrl;

  private String title;

  // getters and setters
}


In both examples, the "gae.unindexed" extension tells App Engine that you want to opt out of the default indexing.  Now remember, requirements change over time.  Just because you're not filtering or sorting by a property today doesn't mean you won't filter or sort by that property tomorrow, so think hard before you choose to mark something as 'unindexed.'

Of course, no matter how well you've planned you'll find yourself in a situation where you have existing data and you need to change a property from indexed to unindexed or from unindexed to indexed.  What do you do?
Let's start with the easy one - going from indexed to unindexed.

If you change a property from indexed to unindexed, the index records for all existing entities with that property will continue to exist until you update (or delete) the entity with which those index records are associated.  Any new entities will be created without the index records for the newly unindexed property.  As long as you've taken care to purge your code of all queries that filter or sort by the property that is now unindexed you'll be ready to go as soon as you upload your new application version.

Changing a property from unindexed to indexed is more difficult.  Presumably you're doing this because you need to filter or sort by that property in your queries, and since entities without index records for the filter and sort properties are automatically excluded from the result set, the only way that your queries involving the newly indexed property are going to return the results you expect is if you create index records for all the entities that existed before you made the property indexed.  Yikes.  I told you to think hard, didn't I?

So what do you do?  Well, the missing index records will be created whenever you rewrite an entity, so you'll need to map over all entities of the appropriate kind and "touch" each one (fetch it then put it).  You can use the Task Queue to break this work up, and when Datastore Cursors are released (coming soon!) this will be even easier.  Still, you've got some work to do.

My final warning: Unindexed properties are an optimization, and just like any optimization it is possible to invoke it prematurely.  Using unindexed properties will speed up writes and reduce disk usage, but will it speed up writes and reduce disk usage enough to make the optimization worthwhile?  I can't answer that question for you.  The impact will depend on how much data you have and how often you're writing it.

18 comments:

  1. Hi Max,
    Will the two index records be created also for a field expressing a one-to-many relationship?
    I mean for a field like:

    @Persistent
    private List list = new ArrayList();

    Because I don't know if it is (will be) possible to query on such a fields.

    ReplyDelete
  2. Today the parent object does not have any properties containing the keys of its children, but in a future release this will change. There's no harm in marking a relationship field as unindexed. For now it will be a no-op, but when the parent object has references to its children's keys it will prevent these references from getting indexed.

    ReplyDelete
  3. Hi, Patriczio Munzi have done a great question, how to search any Book that contains chapter("Prologe") for an example.

    I creating an app, and without this feature the app have no sense.

    ReplyDelete
  4. Returning all Books that contain a chapter named "Prologe" either requires a join (which we don't currently support) or denormalization. In your example it shouldn't be too hard to store chapter names on the Book, you just have to be diligent about updating the book whenever you add or remove a new chapter.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts.Thanks for all your great posts.


    public records

    ReplyDelete
  7. Hi Max,
    great post, it could help me a lot. Anyway, in the JPA example, what's the package of the @Extension annotation?

    ReplyDelete
  8. I think I've found it: org.datanucleus.api.jpa.annotations from datanucleus-api-jpa.jar.

    ReplyDelete
  9. Sorry that I'm polluting your blogpost, the above package I written wasn't correct. The correct package is: org.datanucleus.jpa.annotations

    ReplyDelete
  10. You provides a very nice post to us. Its really very helpful to me to find result on search engine. Hope to hear more good information from your side. online property tax record

    ReplyDelete
  11. This comment has been removed by the author.

    ReplyDelete
  12. We have look at your blog post post and i also were built with a useful and also knowledgeable advice through the website.this is a legitimate great post. Professional Moving Company

    ReplyDelete
  13. Property records show the British Virgin Islands-registered company Golden Map Ltd bought at least two apartments. Visit Our Site

    ReplyDelete
  14. Having a property or two is a good sign of personal economic stability. As investments, they can become financial streams. However, they can also become financial sinkholes when they are mismanaged. About Us

    ReplyDelete
  15. Property management can often turn into a full-time job. This article addresses some of the services owners are better off out-sourcing to reduce costs and improve overall management of the property. Visit Red Moon

    ReplyDelete
  16. Residence data are available in region, district, point out in addition to local home office. The Clerk of Judge, Team of Information in addition to Recorder of Home will be the typical keepers of this kind of paperwork. Many people are responsible for your safekeeping of ownership data of properties and also other similar paperwork, as well as taxation's in addition to liens. male strip show liverpool

    ReplyDelete
  17. Developing a property or home or even a couple is an effective sign of personal economic balance. As purchases, they will become financial channels. On the other hand, they will in addition become financial sinkholes when they usually are mismanaged. erotic writing

    ReplyDelete
  18. The web has many free property records and valuation sites. Property Ownership Records available on the web for real estate information and lots more about real estate.

    ReplyDelete