Tuesday, November 24, 2009

Prerelease Of The Next SDK

Hi all,

Just a quick post to let you know that we've made a prerelease of SDK 1.2.8 available at http://code.google.com/p/googleappengine/downloads/list. This is for local development only, but should still give you an opportunity to try out our new features and let us know if anything looks amiss. Persistence-related features of note:
- Support for inheritance in JDO and JPA (finally!)
- Support for != filters in the low level api, JDO, and JPA
- Support for IN filters in the low level api, JDO, and JPA

All the gory details can be found in the release notes of the download.

Please give it a try!

Friday, November 20, 2009

Case Insensitive Queries

Hi everyone.  I've been posting a bit less frequently the past few weeks because the team has really been working hard on our next release.  It's almost ready.....we're excited to get it out to you!  On to today's topic....

I see questions about case insensitive queries on our message board pretty regularly.  We usually give people the same answer, but rather than continuing to describe the solution, why not just provide it?  Here we go.

If you're using a relational database it's pretty easy to execute case-insensitive queries because it's a native feature of the database.  It usually looks something like this:

 select * from Person where UPPER(lastName) = 'ROSS'

This is a challenge for the App Engine Datastore because we rely on executing scans over a limited subset of your data and returning results as we encounter them.  Consider the following strings in lexicographic order:
...
ROSE
...
ROSS
...
ROSTA
...
Rose
...
Ross
...
Rosta
...
rose
...
ross
...
rosta
...

As you can see there could be an unlimited number of rows in between 'ROSS' and 'Ross', and also between 'Ross' and 'ross' (okay not unlimited, we do have a limit on indexed string length, but definitely Large), so if we start scanning at 'ROSS' we might have to skip a huge number of results before we hit 'Ross', and that doesn't scale.  This is why the datastore doesn't support case-insensitive queries.

Fortunately it's not hard to implement support for case-insensitive queries in your application.  Here's the approach: for each field that you want to query in a case-insensitive way, create a duplicate field that stores the value of that field in either all upper or lowercase letters.  Then, have your model object register a pre-persist callback with your persistence framework (JPA or JDO), and then populate the duplicate fields inside that callback.  Here's what it looks like.

JPA:
import javax.persistence.PrePersist;
import javax.persistence.PreUpdate;
 

@Entity
public class Person {
  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  private String lastName; 
  private String uppercaseLastName;

  public Long getId() {
    return id;
  }


  public String getLastName() {
    return lastName;
  }

  public void setLastName(String lastName) {
    this.lastName = lastName;
  }

  @PrePersist
  @PreUpdate
  public void prePersist() {
    if (lastName != null) {
      uppercaseLastName = lastName.toUpperCase();
    } else {
      uppercaseLastName = null;
    }
  } 

}
 
public List<Person> getPeopleByLastName(String lastName) {
  Query q = em.createQuery("select from Person where uppercaseLastName = :p");
  q.setParameter("p", lastName.toUpperCase());
  return (List<Person>) q.getResultList(); 

}


JDO:
import javax.jdo.listener.StoreCallback;

@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Person implements StoreCallback {
  @PrimaryKey
  @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
  private Long id;
  private String lastName;
  private String uppercaseLastName;

  public Long getId() {
    return id;
  }

  public String getLastName() {
    return lastName;
  }

  public void setLastName(String lastName) {
    this.lastName = lastName;
  }

  public void jdoPreStore() {
    if (lastName != null) {
      uppercaseLastName = lastName.toUpperCase();
    } else {
      uppercaseLastName = null;
    }
  }
}

public List<Person> getPeopleByLastName(String lastName) { 
  Query q = pm.newQuery(Person.class, "uppercaseLastName == :p"); 
  return (List<Person>) q.execute(lastName.toUpperCase()); 
}

Note that in both examples I have not created a getter or a setter for the uppercaseLastName field.  That's because this field is part of the interface we're exposing to the persistence framework but not part of the public interface of the model object.  Your query code can filter by the uppercaseLastName field but the client code that interacts with the Person object has no reason to know that there are duplicate fields that are used to support case-insensitive queries.

Wednesday, November 4, 2009

Unindexed Properties

Did you know that, by default, the App Engine Datastore writes two index records for every entity property that isn't a com.google.appengine.api.datastore.Blob or a com.google.appengine.api.datastore.Text?  It's true!  These index records allow you to execute a variety of queries involving the property without creating a composite index.

Now, these index records don't come for free.  They take time to write and they take up space on disk.  If you have a property that you're absolutely positively sure you'll never want to filter or sort by, you can opt-out of the default indexing that is going on.  Let's look at an example.  We'll use a Book class with a com.google.appengine.api.datastore.Link property that is the URL of an image of the Book's cover.  Our assumption here is that we don't need to select Books based on the value of this property and we don't need to sort them based on the value of this property.


JPA:
@Entity
public class Book {
  @Id
  @GeneratedValue(strategy=GenerationType.IDENTITY)
  private Long id;

  @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
  private Link coverImageUrl;

  private String title;

  // getters and setters
}


JDO:
@PersistenceCapable(identityType=IdentityType.APPLICATION)
public class Book {
  @PrimaryKey
  @Persistent(valueStrategy=IdGeneratorStrategy.IDENTITY)
  private Long id;

  @Persistent
  @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
  private Link coverImageUrl;

  private String title;

  // getters and setters
}


In both examples, the "gae.unindexed" extension tells App Engine that you want to opt out of the default indexing.  Now remember, requirements change over time.  Just because you're not filtering or sorting by a property today doesn't mean you won't filter or sort by that property tomorrow, so think hard before you choose to mark something as 'unindexed.'

Of course, no matter how well you've planned you'll find yourself in a situation where you have existing data and you need to change a property from indexed to unindexed or from unindexed to indexed.  What do you do?
Let's start with the easy one - going from indexed to unindexed.

If you change a property from indexed to unindexed, the index records for all existing entities with that property will continue to exist until you update (or delete) the entity with which those index records are associated.  Any new entities will be created without the index records for the newly unindexed property.  As long as you've taken care to purge your code of all queries that filter or sort by the property that is now unindexed you'll be ready to go as soon as you upload your new application version.

Changing a property from unindexed to indexed is more difficult.  Presumably you're doing this because you need to filter or sort by that property in your queries, and since entities without index records for the filter and sort properties are automatically excluded from the result set, the only way that your queries involving the newly indexed property are going to return the results you expect is if you create index records for all the entities that existed before you made the property indexed.  Yikes.  I told you to think hard, didn't I?

So what do you do?  Well, the missing index records will be created whenever you rewrite an entity, so you'll need to map over all entities of the appropriate kind and "touch" each one (fetch it then put it).  You can use the Task Queue to break this work up, and when Datastore Cursors are released (coming soon!) this will be even easier.  Still, you've got some work to do.

My final warning: Unindexed properties are an optimization, and just like any optimization it is possible to invoke it prematurely.  Using unindexed properties will speed up writes and reduce disk usage, but will it speed up writes and reduce disk usage enough to make the optimization worthwhile?  I can't answer that question for you.  The impact will depend on how much data you have and how often you're writing it.