Wednesday, December 30, 2009

More compact @PersistenceCapable declarations

When I started writing I said I was going to treat JDO and JPA equally in these posts, but I've just learned something so earth-shatteringly helpful that happens to be specific to JDO. What would you have me do? Should I withhold valuable information just because it doesn't have a corresponding representation in JPA? That hardly seems right. How about a compromise: I'm going to share the JDO nugget and I promise to make it up to the JPA users at some point in the future. Sound reasonable? We have a deal? Ok, good. Now hold on to your hats, here we go:

I've always written PersistenceCapable declarations like this:

@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Foo {
    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key key;
    // ...
}

Kinda verbose, right? It turns out that if you omit 'identityType' DataNucleus will still do the right thing so long as you've set your PrimaryKey up properly:

@PersistenceCapable
public class Foo {
    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key key;
    // ...
}

I just saved you 41 characters. Happy new year everyone!

Friday, December 4, 2009

Queries with != and IN filters

Since we launched Google App Engine for Java back in April there has been a sizable disparity between the persistence features of Python and Java: Python had support for the != and IN operators in query filters and Java did not. With the release of SDK 1.2.8 this disparity is now gone (hooray!), but before you go rushing out to take advantage of these new operators I want to take some time to explain how the performance of queries that use these operators is a little bit different from what you're used to.

The key difference is that the datastore does not natively support != and IN. Instead, these operators are implemented in "userland." The code lives in appengine-api.jar, and it knows how to transform queries that use != and IN into queries that the datastore natively supports. Let's look at an example. Suppose you want to issue a query that returns all people not named Max, ordered by name and age.

Under the hood, this query gets broken up into two queries. The first query returns all people whose name is lexicographically smaller than Max, ordered by name and age. The second query returns all people whose name is lexicographically larger than Max, ordered by name and age.

JPA:
select from Person where name <> "Max" order by name, age

is translated into

select from Person where name < "Max" order by name, age
select from Person where name > "Max" order by name, age

JDO:
select from Person where name != "Max" order by name, age

is translated into



select from Person where name < "Max" order by name, age 
select from Person where name > "Max" order by name, age

The results of these two queries are then merged, with the sort applied in-memory as we go. Both result sets are ordered by the same properties, so we can determine the next result to return by comparing the two results at the front of the two result sets. This makes the in-memory sort efficient. We also need to de-dupe as we go because a multi-value property may place an Entity in both result sets. The performance of this query is equivalent to the performance of the two underlying queries with some additional memory consumption because we need to maintain references to all the results we've already returned in order to de-dupe.

IN filters are implemented in similar fashion, but instead of requiring a fixed number of queries to fulfill they require N queries, where N is the number of values in the IN clause.  Suppose you want all people whose favorite foods are cheeseburgers, pizza, and fried chicken, ordered by favoriteFood and age.

Under the hood this query gets broken up into three queries. The first query returns all people whose favorite food is cheeseburgers, ordered by favoriteFood and age. The second query returns all people whose favorite food is pizza, ordered by favoriteFood and age. The third query includes all people whose favorite food is fried chicken, ordered by favoriteFood and age.

JPA:
select from Person where
    favoriteFood IN ('cheeseburger', 'pizza', 'fried chicken')
    order by favoriteFood, age

is translated into

select from Person where favoriteFood = "cheeseburger" order by favoriteFood, age
select from Person where favoriteFood = "pizza" order by favoriteFood, age
select from Person where favoriteFood = "fried chicken" order by favoriteFood, age

JDO:
Query q = pm.newQuery(
    "select from Person where :p1.contains(favoriteFood) order by favoriteFood, age");
q.execute(Arrays.asList("cheeseburger", "pizza", "fried chicken"));

is translated into

select from Person where favoriteFood == "cheeseburger" order by favoriteFood, age
select from Person where favoriteFood == "pizza" order by favoriteFood, age
select from Person where favoriteFood == "fried chicken" order by favoriteFood, age

Once again we merge the results of these three queries, sorting and de-duping as we go. Once again the performance of this query is equivalent to the performance of the underlying queries with some additional memory consumption.

Can you issue a query that combines != and IN? Absolutely. Can you have multiple IN filters? Absolutely again! But, be careful, if your query requires more than 30 underlying queries to fulfill you'll get an exception. Here's a query that requires too many underlying queries to fulfill:

JPA:
select from Person where 
    name <> "Max" AND 
    favoriteFood IN ("cheeseburger", "pizza", "fried chicken") AND
    age IN (10, 11, 12, 13, 14, 15, 16, 17, 18 19) order by name

is translated into

select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 10 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 10 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 11 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 11 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 12 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 12 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 13 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 13 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 14 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 14 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 15 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 15 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 16 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 16 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 17 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 17 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 18 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 18 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 19 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 19 order by name
select from Person where name < "Max" AND favoriteFood = "pizza" AND age = 10 order by name
select from Person where name > "Max" AND favoriteFood = "pizza" AND age = 10 order by name
select from Person where name < "Max" AND favoriteFood = "pizza" AND age = 11 order by name
select from Person where name > "Max" AND favoriteFood = "pizza" AND age = 11 order by name

.... and so on.

JDO:
Query q = pm.newQuery("select from Person where " + 
    "name != 'Max' AND " +
    ":p1.contains(favoriteFood) && :p2.contains(age) order by name");
q.execute(Arrays.asList("cheeseburger", "pizza", "fried chicken"),
    Arrays.asList(10, 11, 12, 13, 14, 15, 16, 17, 18, 19));

is translated into

select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 10 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 10 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 11 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 11 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 12 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 12 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 13 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 13 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 14 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 14 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 15 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 15 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 16 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 16 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 17 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 17 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 18 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 18 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 19 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 19 order by name
select from Person where name < "Max" && favoriteFood == "pizza" AND age == 10 order by name
select from Person where name > "Max" && favoriteFood == "pizza" AND age == 10 order by name
select from Person where name < "Max" && favoriteFood == "pizza" AND age == 11 order by name
select from Person where name > "Max" && favoriteFood == "pizza" AND age == 11 order by name

.... and so on.

One final note that isn't entirely related but that I'll tack on anyway: You can now use the 'OR' operator in your queries as long as the query can be rewritten using IN. That means this now works:

JPA:
select from Person where 
    favoriteFood = "cheeseburger" OR 
    favoriteFood = "pizza" OR 
    favoriteFood = "fried chicken"

JDO:
select from Person where 
    favoriteFood == "cheeseburger" || 
    favoriteFood == "pizza" || 
    favoriteFood == "fried chicken"

But this still doesn't:

JPA:
select from Person where 
    favoriteFood = "cheeseburger" OR age = 10

JDO:
select from Person where 
    favoriteFood == "cheeseburger" || age == 10

Enjoy the new operators!

Tuesday, November 24, 2009

Prerelease Of The Next SDK

Hi all,

Just a quick post to let you know that we've made a prerelease of SDK 1.2.8 available at http://code.google.com/p/googleappengine/downloads/list. This is for local development only, but should still give you an opportunity to try out our new features and let us know if anything looks amiss. Persistence-related features of note:
- Support for inheritance in JDO and JPA (finally!)
- Support for != filters in the low level api, JDO, and JPA
- Support for IN filters in the low level api, JDO, and JPA

All the gory details can be found in the release notes of the download.

Please give it a try!

Friday, November 20, 2009

Case Insensitive Queries

Hi everyone.  I've been posting a bit less frequently the past few weeks because the team has really been working hard on our next release.  It's almost ready.....we're excited to get it out to you!  On to today's topic....

I see questions about case insensitive queries on our message board pretty regularly.  We usually give people the same answer, but rather than continuing to describe the solution, why not just provide it?  Here we go.

If you're using a relational database it's pretty easy to execute case-insensitive queries because it's a native feature of the database.  It usually looks something like this:

 select * from Person where UPPER(lastName) = 'ROSS'

This is a challenge for the App Engine Datastore because we rely on executing scans over a limited subset of your data and returning results as we encounter them.  Consider the following strings in lexicographic order:
...
ROSE
...
ROSS
...
ROSTA
...
Rose
...
Ross
...
Rosta
...
rose
...
ross
...
rosta
...

As you can see there could be an unlimited number of rows in between 'ROSS' and 'Ross', and also between 'Ross' and 'ross' (okay not unlimited, we do have a limit on indexed string length, but definitely Large), so if we start scanning at 'ROSS' we might have to skip a huge number of results before we hit 'Ross', and that doesn't scale.  This is why the datastore doesn't support case-insensitive queries.

Fortunately it's not hard to implement support for case-insensitive queries in your application.  Here's the approach: for each field that you want to query in a case-insensitive way, create a duplicate field that stores the value of that field in either all upper or lowercase letters.  Then, have your model object register a pre-persist callback with your persistence framework (JPA or JDO), and then populate the duplicate fields inside that callback.  Here's what it looks like.

JPA:
import javax.persistence.PrePersist;
import javax.persistence.PreUpdate;
 

@Entity
public class Person {
  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  private String lastName; 
  private String uppercaseLastName;

  public Long getId() {
    return id;
  }


  public String getLastName() {
    return lastName;
  }

  public void setLastName(String lastName) {
    this.lastName = lastName;
  }

  @PrePersist
  @PreUpdate
  public void prePersist() {
    if (lastName != null) {
      uppercaseLastName = lastName.toUpperCase();
    } else {
      uppercaseLastName = null;
    }
  } 

}
 
public List<Person> getPeopleByLastName(String lastName) {
  Query q = em.createQuery("select from Person where uppercaseLastName = :p");
  q.setParameter("p", lastName.toUpperCase());
  return (List<Person>) q.getResultList(); 

}


JDO:
import javax.jdo.listener.StoreCallback;

@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Person implements StoreCallback {
  @PrimaryKey
  @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
  private Long id;
  private String lastName;
  private String uppercaseLastName;

  public Long getId() {
    return id;
  }

  public String getLastName() {
    return lastName;
  }

  public void setLastName(String lastName) {
    this.lastName = lastName;
  }

  public void jdoPreStore() {
    if (lastName != null) {
      uppercaseLastName = lastName.toUpperCase();
    } else {
      uppercaseLastName = null;
    }
  }
}

public List<Person> getPeopleByLastName(String lastName) { 
  Query q = pm.newQuery(Person.class, "uppercaseLastName == :p"); 
  return (List<Person>) q.execute(lastName.toUpperCase()); 
}

Note that in both examples I have not created a getter or a setter for the uppercaseLastName field.  That's because this field is part of the interface we're exposing to the persistence framework but not part of the public interface of the model object.  Your query code can filter by the uppercaseLastName field but the client code that interacts with the Person object has no reason to know that there are duplicate fields that are used to support case-insensitive queries.

Wednesday, November 4, 2009

Unindexed Properties

Did you know that, by default, the App Engine Datastore writes two index records for every entity property that isn't a com.google.appengine.api.datastore.Blob or a com.google.appengine.api.datastore.Text?  It's true!  These index records allow you to execute a variety of queries involving the property without creating a composite index.

Now, these index records don't come for free.  They take time to write and they take up space on disk.  If you have a property that you're absolutely positively sure you'll never want to filter or sort by, you can opt-out of the default indexing that is going on.  Let's look at an example.  We'll use a Book class with a com.google.appengine.api.datastore.Link property that is the URL of an image of the Book's cover.  Our assumption here is that we don't need to select Books based on the value of this property and we don't need to sort them based on the value of this property.


JPA:
@Entity
public class Book {
  @Id
  @GeneratedValue(strategy=GenerationType.IDENTITY)
  private Long id;

  @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
  private Link coverImageUrl;

  private String title;

  // getters and setters
}


JDO:
@PersistenceCapable(identityType=IdentityType.APPLICATION)
public class Book {
  @PrimaryKey
  @Persistent(valueStrategy=IdGeneratorStrategy.IDENTITY)
  private Long id;

  @Persistent
  @Extension(vendorName="datanucleus", key="gae.unindexed", value="true")
  private Link coverImageUrl;

  private String title;

  // getters and setters
}


In both examples, the "gae.unindexed" extension tells App Engine that you want to opt out of the default indexing.  Now remember, requirements change over time.  Just because you're not filtering or sorting by a property today doesn't mean you won't filter or sort by that property tomorrow, so think hard before you choose to mark something as 'unindexed.'

Of course, no matter how well you've planned you'll find yourself in a situation where you have existing data and you need to change a property from indexed to unindexed or from unindexed to indexed.  What do you do?
Let's start with the easy one - going from indexed to unindexed.

If you change a property from indexed to unindexed, the index records for all existing entities with that property will continue to exist until you update (or delete) the entity with which those index records are associated.  Any new entities will be created without the index records for the newly unindexed property.  As long as you've taken care to purge your code of all queries that filter or sort by the property that is now unindexed you'll be ready to go as soon as you upload your new application version.

Changing a property from unindexed to indexed is more difficult.  Presumably you're doing this because you need to filter or sort by that property in your queries, and since entities without index records for the filter and sort properties are automatically excluded from the result set, the only way that your queries involving the newly indexed property are going to return the results you expect is if you create index records for all the entities that existed before you made the property indexed.  Yikes.  I told you to think hard, didn't I?

So what do you do?  Well, the missing index records will be created whenever you rewrite an entity, so you'll need to map over all entities of the appropriate kind and "touch" each one (fetch it then put it).  You can use the Task Queue to break this work up, and when Datastore Cursors are released (coming soon!) this will be even easier.  Still, you've got some work to do.

My final warning: Unindexed properties are an optimization, and just like any optimization it is possible to invoke it prematurely.  Using unindexed properties will speed up writes and reduce disk usage, but will it speed up writes and reduce disk usage enough to make the optimization worthwhile?  I can't answer that question for you.  The impact will depend on how much data you have and how often you're writing it.

Wednesday, October 21, 2009

Optimistic Locking With @Version

The datastore does optimistic locking, so if you update an entity inside a transaction and another request commits an update to either that same entity or some other entity in the same entity group before you commit your transaction, you will get an exception.  This keeps your data from getting stomped on when there are concurrent requests.  However, in the web-app world where clients can and typically do maintain state across requests, there is a far more likely scenario in which your data can get stomped on.  Consider the following scenario involving a bug tracking system:

Time 1: User A brings up the details page for Bug 838.
Time 2: User B brings up the details page for Bug 838.
Time 3: User A assigns Bug 838 to Jim.
Time 4: User B assigns Bug 838 to Sally.

Even though User A and User B updated Bug 838 at different times, User A's update was effectively stomped on by User B.  Sure, the fact that Bug 838 was assigned to Jim at some point may be in the history for the Bug, we can't change the fact that User B made a decision to assign Bug 838 based on out-of-date information.  In a bug tracking system this might not be such a big deal, but if you're doing something like compensation planning you'd much rather have your users receive an exception when they make an update based on out-of-date information.  Fortunately this is easy to implement using JPA/JDO on top of App Engine.  Let's use a Person object with a 'salary' property as an example.

JPA:
@Entity
public class Person {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Long id;

    private int salary;

    @Version
    private long version;

    // ...getters and setters
}

public void updateSalary(EntityManager em, Person p, int newSalary) {
    em.getTransaction().begin();
    try {
        p.setSalary(newSalary);
        p = em.merge(p);
        em.getTransaction().commit();
    } catch (RollbackException e) {
        if (e.getCause() instanceof OptimisticLockException) {
            handleVersionConflict(e.getCause(), p);
        } else {
            throw e;
        }
    } finally {
        if (em.getTransaction().isActive()) {
            em.getTransaction().rollback();
        }
    }
}

JDO:

@PersistenceCapable(identityType=IdentityType.APPLICATION)
@Version(strategy=VersionStrategy.VERSION_NUMBER)
public class Person {
    @PrimaryKey
    @Persistent(valueStrategy=IdGeneratorStrategy.IDENTITY)
    private Long id;

    private int salary;


    // ...getters and setters
}

public void updateSalary(PersistenceManager pm, Person p, int newSalary) {
    pm.currentTransaction().begin();
    try {
        p.setSalary(newSalary);
        pm.makePersistent(p);
        pm.currentTransaction().commit();
    } catch (JDOOptimisticVerificationException e) {
        handleVersionConflict(e, p);
    } finally {        
        if (pm.currentTransaction().isActive()) {
            pm.currentTransaction().rollback();
        }
    }
}

If you declare a Version field on your model object, JDO/JPA will compare the value of that field on the instance you are updating with the value of that field in the datastore.  If the version numbers are equal the version number will be incremented and your model object will be persisted.   If the version numbers are not equal the appropriate exception will be thrown.  In the above examples I've caught this exception in the update code itself, but this is really just to illustrate what's going on.  In practice I would most likely let this exception propagate out of the update method and handle it at a higher level, perhaps even as part of a generic exception handler in my presentation code.

Note that there is a performance cost to using this feature.  The datastore does not support updates with predicates the way relational databases do ('update person where version = 3') so in order to perform the version comparison it needs to do an additional get() for the entity that corresponds to your model object.  You'll have to decide for yourself whether or not the cost of the additional fetch is worthwhile.

Now, if you've looked at both the JPA and JDO examples you may have noticed that JPA requires the declaration of an explicit field annotated with @Version while JDO accepts an @Version annotation on the class without an explicit field.  If you are using JDO and would like the version stored in an explicit field you can use a DataNucleus-specific extension to tell JDO which field to use.  You can see an example of this here.

Using JPA and JDO's built-in versioning checks is an easy way to help users of your app make decisions based on up-to-date information.  Try it out!

Thursday, October 15, 2009

Serialized Fields

(This was originally posted to the Google App Java Google Group on October 14, 2009)

Suppose one of your model objects has a field that isn't a native datastore type like String or Date and isn't another model object, like the child of an owned OneToOne relationship.  Oftentimes you can use an Embedded class to store that field's sub-fields in the same record as the containing record, but what if an Embedded class isn't sufficient?  This is where Serialized Fields come in.  With JDO and JPA it's possible to store any class that implements java.io.Serializable in a single property.

For our example lets model Person, ContactProfiles, ContactProfile, and PhoneNumber.  A Person has exactly one ContactProfiles instance, a ContactProfiles has some number of ContactProfile instances, and a ContactProfile has some number of PhoneNumber instances.  Our application is such that when we retrieve a Person from the datastore it always makes sense to load their ContactProfiles as well.  We'll define ContactProfiles, ContactProfile, and PhoneNumber first since they're the same whether we're using JPA or JDO:

public static class ContactProfiles implements Serializable {
    private final
List<ContactProfile> profiles;

    public ContactProfiles(List<ContactProfile> profiles) {
      this.profiles = profiles;
    }

    public
List<ContactProfile> get() {
      return profiles;
    }
}

public class ContactProfile implements Serializable {
    private String profileName;
    private
List<PhoneNumber> phoneNumbers;

    // getters and setters
}

public class PhoneNumber implements Serializable {
    private String type;
    private String number;

    // getters and setters
}


The JPA Person definition:
@Entity
public class Person {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    @Lob
    private ContactProfiles contactProfiles;

    // getters and setters
}


The JDO Person definition:

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")
public class Person { 
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    @Persistent(serialized = "true")
    private ContactProfiles contactProfiles;

    // getters and setters
}


In this example the App Engine JDO/JPA implementation converts the Person.contactProfiles field into a com.google.appengine.api.datastore.Blob using standard Java serialization.  This Blob is then stored as a single property on the Person Entity.  When you fetch a Person the corresponding Entity is retrieved and the Blob is deserialized in just a single datastore rpc, leaving you with a ContactProfiles instance and all its associated sub-objects.

There are a few drawbacks to this approach that you should be aware of.  First, even though your application understands the structure of the bytes that make up the ContactProfiles, the datastore does not.  As a result you can't filter or sort on any of the properties of ContactProfile or PhoneNumber.  Second, all the known gotchas associated with Java serialization apply here.  If you evolve your Serializable classes make sure you do it in a backwards compatible way!  Third, you might be looking at the definition of the ContactProfiles class and thinking "Why are we bothering with a ContactProfiles class when Person could just store List<ContactProfile>?"  The reason is that updating a serialized field is not as simple as updating other types of fields, and this extra layer of indirection makes things a bit easier.  DataNucleus has no way of "seeing" the changes you make to the internal values of a serialized field, and as a result it doesn't know when it needs to flush changes to these fields to the datastore.  However, DataNucleus can see when the top-level serialized field reference changes.  Giving Person a reference to a ContactProfiles object makes it easy for us to change ContactProfiles in a way that DataNucleus is guaranteed to recognize:

public void addContactProfile(Person p, ContactProfile cp) {
    // update our serialized field
    p.getContactProfiles().get().add(cp);
    // give the person a new ContactProfiles reference so the update is detected
    p.setContactProfiles(new ContactProfiles(p.getContactProfiles().get()));
}

The last line where we set a new ContactProfiles instace back on the Person is the key and it's easy to miss, so be careful!  If you forget this step your updates will be ignored and you will be sad.

Despite the limitations I've listed, Serialized fields are a good way to store structured data inside the record of a containing Entity.  As long as you can get by without filtering or sorting on this data and you remember that it has special update requirements, serialized fields will almost certainly come in handy at some point.

Keys Only Queries

(This was originally posted to the Google App Java Google Group on October 6, 2009)

If you use the low-level datastore api you may have noticed that the com.google.appengine.api.datastore.Query class has a setKeysOnly() method.  If you call this method before you execute the query the datastore will return com.google.appengine.api.datastore.Entity instances that have their keys filled in but none of their properties.  This can reduce consumption of your Datastore Data Received from API quota, especially if you've got some large entities, but, more importantly, it can also reduce consumption of your Datastore CPU Time quota.  How?  Well, if the fulfillment of your query requires an index or a merge join your query actually executes in two stages:  First it scans the index to find the keys of the entities that match and then it issues additional scans to retrieve the entities uniquely identified by the matching keys.  If your query is keys-only we can skip that second step entirely.  That means faster queries!

Now, JPA and JDO don't know anything about keys-only queries, but they do give you the flexibility to either return your entire object or some subset of the fields on your object.  If you construct this subset to only contain the primary key of your object, the App Engine implementation of JPA and JDO will use a keys-only query.  Let's look at some examples:

JPA:
@Entity
public class Book {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    private Date dateOfPublication;

    // getters and setters

}


Now let's implement a method that returns the Keys of all Books published betweeen 2 years (we'll assume someone else is creating and closing an EntityManager named 'em' for us):

public List keysOfBooksPublishedBetween(EntityManager em, Date from, Date to) {
    em.getTransaction().begin();
    try {
        Query q = em.createQuery("select id from " + Book.class.getName()
            + " where dateOfPublication >= :from AND dateOfPublication <= :to");
        q.setParameter("from", from);
        q.setParameter("to", to);
        return (List) q.getResultList();
    } finally {
        em.getTransaction().rollback();
    }
}


JDO:

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")
public class Book {
 
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    private Date dateOfPublication;


    // getters and setters
}

Now let's implement a method that returns the Keys of all Books published betweeen 2 years (we'll assume someone else is creating and closing a PersistenceManager named 'pm' for us):


public List keysOfBooksPublishedBetween(PersistenceManager pm, Date from, Date to) {
    pm.currentTransaction().begin();
    try {
        Query q = pm.newQuery("select id from " + Book.class.getName()
            + " where dateOfPublication >= :from && dateOfPublication <= :to");
        return (List) q.execute(from, to);
    } finally {
        pm.currentTransaction().rollback();
    }
}
--------------------------------
Notice how we are only selecting the 'id' field from our Book class.  This is crucial.  If you select any other fields your query will end up fetching entire entities and the optimization will be lost.

Updating A Bidrectional Owned One-To-Many With A New Child

(This was originally posted to the Google App Java Google Group on September 28, 2009)  

All the way back in our first post one we demonstrated how to create both a parent and a child of a bidirectional, owned, one-to-many relationship at the same time.  This week we're going to see how to add a child to an existing parent.  We'll use the same model objects we used before:

JPA:
@Entity
public class Book {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    private String title;


    @OneToMany(mappedBy = "book", cascade = CascadeType.ALL)

    private List<Chapter> chapters = new ArrayList<Chapter>();

    // getters and setters
}

@Entity
public class Chapter {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    private String title;
    private int numPages;

    @ManyToOne(fetch = FetchType.LAZY)
    private Book book;

    // getters and setters
}

Now let's assume we've already created a book with a few chapters in the datastore and we want to add a brand new chapter to a Book with a given id (we'll assume someone else is creating and closing an EntityManager named 'em' for us):

public void addChapterToBook(EntityManager em, Key bookKey, Chapter chapter) {
    em.getTransaction().begin();
    try {
        Book b = em.find(Book.class, bookKey);
        if (b == null) {
            throw new RuntimeException("Book " + bookKey + " not found!");
        }
        b.getChapters().add(chapter);

        em.getTransaction().commit();
    } finally {
        if (em.getTransaction().isActive()) {
            em.getTransaction().rollback();
        }
    }
}


JDO:

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")
public class Book {
 
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    private String title;


    @Persistent(mappedBy = "book")
    @Element(dependent = "true")
    @Order(extensions = @Extension(vendorName="datanucleus", key="list-ordering", value="id asc"))
    private List<Chapter> chapters = new ArrayList<Chapter>();


    // getters and setters
}

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")

public class Chapter {
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    private String title;
    private int numPages;

    @Persistent
    private Book book;

    // getters and setters

}

Now let's assume we've already created a book with a few chapters in the datastore and we want to add a brand new chapter to a Book with a given id (we'll assume someone else is creating and closing a PersistenceManager named 'pm' for us):


public void addChapterToBook(PersistenceManager pm, Key bookKey, Chapter chapter) {
    pm.currentTransaction().begin();
    try {
        // throws a runtime exception if book is not found       
        Book b = pm.getObjectById(Book.class, bookKey);
        b.getChapters().add(chapter);

        pm.currentTransaction().commit();
    } finally {
        if (pm.currentTransaction().isActive()) {
            pm.currentTransaction().rollback();
        }
    }
}
--------------------------------

The interesting thing about both of these examples is that we're not making any explicit calls to save the new Chapter.  We look up the Book identified by the Key that was passed into the function and then we manipulate the persistent state of the object by manipulating the POJO that was returned by em.fetch/pm.getObjectById.  JPA and JDO both have mechanisms that allow them to monitor the objects that you've looked up for changes.  Ever wonder what exactly the enhancer is doing to your classes?  It's adding hooks so that the persistence framework gets notified when things change (among other things).  This allows JPA and JDO to automatically flush your changes to the datastore when you commit your transaction.  If you wanted to modify the title of the Book or the number of pages in an existing Chapter the approach would be exactly the same: Start a transaction, look up the Book, make your changes, commit your transaction.  Whether you're using JPA or JDO your changes will be persisted for you without any explicit calls to change the persistent state.  This is a prime example of how JPA and JDO facilitate "transparent persistence."

Executing Batch Gets

(This was originally posted to the Google App Java Google Group on September 21, 2009) 

Did you know that the App Engine datastore supports batch gets?  Batch gets are a super-efficient way to load multiple entities when you already have the keys of the entities you want loaded.  Here's an example using the low-level Datastore API:

  public Map getById(List keys) {
      DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
      return ds.get(keys);        
  }

Now lets see how we can accomplish the same thing in JPA and JDO:

JPA:
@Entity
public class Book {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key key;
    private String title;


    // additional members, getters and setters
}



public List getById(List keys) {
    Query q = em.createQuery("select from " + Book.class.getName() + " where key = :keys");
    q.setParameter("keys", keys);
    return (List) q.getResultList();
}


JDO:
@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Book {
    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key key;
    private String title;


    // additional members, getters and setters
}


public List getById(List keys) {
    Query q = pm.newQuery("select from " + Book.class.getName() + " where key == :keys");
    return (List) q.execute(keys);
}

Notice how in both examples we're constructing a query to pull back the entities by key.  The App Engine JDO/JPA implementation detects queries that are filtering only by key and fulfills them using a low level batch get rather than a datstore query.  This works no matter the type of your primary key field, so whether you're using a Long, an unencoded String, an encoded String, or a Key, the same technique will work.  However, even though this looks like a query, all the fetch-related transaction restrictions apply: if you're executing your batch get query inside a txn, all of the entities you're attempting to fetch must belong to the same entity group or you'll get an exception. Be careful with this.

The next time you need to pull back multiple entities and you already have their keys, issue a query that filters only by your object's key field and reap the benefits of the datastore's optimized batch get mechanism.

Creating A Bidrectional Owned One-To-Many

(This was originally posted to the Google App Java Google Group on September 14, 2009)

Suppose you're building a book catalog application and you want to model books and chapters.  Books contain chapters.  A chapter cannot exist without a book, so if you delete a book you want its chapters automatically deleted along with it.  You also want to each chapter to have a reference to the book that owns it.  Sounds like a bidrectional, owned, one-to-many relationship is just the thing.  First we'll set up our model objects and then we'll add some code to create a Book with 2 Chapters.

JPA:
import com.google.appengine.api.datastore.Key;
//...

@Entity
public class Book {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    private String title;


    @OneToMany(mappedBy = "book", cascade = CascadeType.ALL)

    private List<Chapter> chapters = new ArrayList<Chapter>();

    // getters and setters
}


@Entity
public class Chapter {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    private String title;
    private int numPages;

    @ManyToOne(fetch = FetchType.LAZY)
    private Book book;

    // getters and setters
}

Now let's create a book with two chapters (we'll assume someone else is creating and closing an EntityManager named 'em' for us):

Book b = new Book();
b.setTitle("JPA 4eva");
Chapter c1 = new Chapter();
c1.setTitle("Intro");
c1.setNumPages(10);
b.getChapters().add(c1);
Chapter c2 = new Chapter();
c2.setTitle("Configuration");
c2.setNumPages(9);
b.getChapters().add(c2);

em.getTransaction().begin();
try {
    em.persist(b);
    em.getTransaction().commit();
} finally {
    if (em.getTransaction().isActive()) {
        em.getTransaction().rollback();
    }
}


JDO:

import com.google.appengine.api.datastore.Key;
//...

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")
public class Book {
 
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    private String title;


    @Persistent(mappedBy = "book")
    @Element(dependent = "true")
    private List<Chapter> chapters = new ArrayList<Chapter>();

    // getters and setters
}

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")

public class Chapter {
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    private String title;
    private int numPages;

    @Persistent
    private Book book;

    // getters and setters

}

Now let's create a book with two chapters (we'll assume someone else is creating and closing a PersistenceManager named 'pm' for us):

Book b = new Book();
b.setTitle("JDO 4eva");
Chapter c1 = new Chapter();
c1.setTitle("Intro");
c1.setNumPages(10);
b.getChapters().add(c1);
Chapter c2 = new Chapter();
c2.setTitle("Configuration");
c2.setNumPages(9);
b.getChapters().add(c2);

pm.currentTransaction().begin();
try {
    pm.makePersistent(b);
    pm.currentTransaction().commit();
} finally {
    if (pm.currentTransaction().isActive()) {
        pm.currentTransaction().rollback();
    }
}

Welcome to the GAE/J Persistence Blog!

Greetings everyone!  My name is Max Ross and I'm an engineer on the Google App Engine team who works on persistence stuff: the datastore, the low-level datastore api, and the JDO/JPA implementation.  I'll be your host.

GAE for Java has been available since early April 2009, and since launch we've fixed a lot of bugs and added a lot of features related to JDO and JPA.  However, I read the GAE Java group regularly and I see the persistence questions being asked, and my impression is that we should be giving you, our customers, more guidance.  This blog is my attempt to provide that additional guidance.

Each week I'm going to post a complete example demonstrating how to accomplish a specific persistence-related task.  We don't play favorites here on the App Engine team so I'll construct the example in both JDO and JPA, and I'll try to cover a wide range of topics to keep everyone interested.  My hope is that, over time, this will come to serve as a valuable resource to all App Engine Java developers whether you're new to JDO/JPA or not.

Thanks for reading!