Wednesday, October 21, 2009

Optimistic Locking With @Version

The datastore does optimistic locking, so if you update an entity inside a transaction and another request commits an update to either that same entity or some other entity in the same entity group before you commit your transaction, you will get an exception.  This keeps your data from getting stomped on when there are concurrent requests.  However, in the web-app world where clients can and typically do maintain state across requests, there is a far more likely scenario in which your data can get stomped on.  Consider the following scenario involving a bug tracking system:

Time 1: User A brings up the details page for Bug 838.
Time 2: User B brings up the details page for Bug 838.
Time 3: User A assigns Bug 838 to Jim.
Time 4: User B assigns Bug 838 to Sally.

Even though User A and User B updated Bug 838 at different times, User A's update was effectively stomped on by User B.  Sure, the fact that Bug 838 was assigned to Jim at some point may be in the history for the Bug, we can't change the fact that User B made a decision to assign Bug 838 based on out-of-date information.  In a bug tracking system this might not be such a big deal, but if you're doing something like compensation planning you'd much rather have your users receive an exception when they make an update based on out-of-date information.  Fortunately this is easy to implement using JPA/JDO on top of App Engine.  Let's use a Person object with a 'salary' property as an example.

JPA:
@Entity
public class Person {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Long id;

    private int salary;

    @Version
    private long version;

    // ...getters and setters
}

public void updateSalary(EntityManager em, Person p, int newSalary) {
    em.getTransaction().begin();
    try {
        p.setSalary(newSalary);
        p = em.merge(p);
        em.getTransaction().commit();
    } catch (RollbackException e) {
        if (e.getCause() instanceof OptimisticLockException) {
            handleVersionConflict(e.getCause(), p);
        } else {
            throw e;
        }
    } finally {
        if (em.getTransaction().isActive()) {
            em.getTransaction().rollback();
        }
    }
}

JDO:

@PersistenceCapable(identityType=IdentityType.APPLICATION)
@Version(strategy=VersionStrategy.VERSION_NUMBER)
public class Person {
    @PrimaryKey
    @Persistent(valueStrategy=IdGeneratorStrategy.IDENTITY)
    private Long id;

    private int salary;


    // ...getters and setters
}

public void updateSalary(PersistenceManager pm, Person p, int newSalary) {
    pm.currentTransaction().begin();
    try {
        p.setSalary(newSalary);
        pm.makePersistent(p);
        pm.currentTransaction().commit();
    } catch (JDOOptimisticVerificationException e) {
        handleVersionConflict(e, p);
    } finally {        
        if (pm.currentTransaction().isActive()) {
            pm.currentTransaction().rollback();
        }
    }
}

If you declare a Version field on your model object, JDO/JPA will compare the value of that field on the instance you are updating with the value of that field in the datastore.  If the version numbers are equal the version number will be incremented and your model object will be persisted.   If the version numbers are not equal the appropriate exception will be thrown.  In the above examples I've caught this exception in the update code itself, but this is really just to illustrate what's going on.  In practice I would most likely let this exception propagate out of the update method and handle it at a higher level, perhaps even as part of a generic exception handler in my presentation code.

Note that there is a performance cost to using this feature.  The datastore does not support updates with predicates the way relational databases do ('update person where version = 3') so in order to perform the version comparison it needs to do an additional get() for the entity that corresponds to your model object.  You'll have to decide for yourself whether or not the cost of the additional fetch is worthwhile.

Now, if you've looked at both the JPA and JDO examples you may have noticed that JPA requires the declaration of an explicit field annotated with @Version while JDO accepts an @Version annotation on the class without an explicit field.  If you are using JDO and would like the version stored in an explicit field you can use a DataNucleus-specific extension to tell JDO which field to use.  You can see an example of this here.

Using JPA and JDO's built-in versioning checks is an easy way to help users of your app make decisions based on up-to-date information.  Try it out!

Thursday, October 15, 2009

Serialized Fields

(This was originally posted to the Google App Java Google Group on October 14, 2009)

Suppose one of your model objects has a field that isn't a native datastore type like String or Date and isn't another model object, like the child of an owned OneToOne relationship.  Oftentimes you can use an Embedded class to store that field's sub-fields in the same record as the containing record, but what if an Embedded class isn't sufficient?  This is where Serialized Fields come in.  With JDO and JPA it's possible to store any class that implements java.io.Serializable in a single property.

For our example lets model Person, ContactProfiles, ContactProfile, and PhoneNumber.  A Person has exactly one ContactProfiles instance, a ContactProfiles has some number of ContactProfile instances, and a ContactProfile has some number of PhoneNumber instances.  Our application is such that when we retrieve a Person from the datastore it always makes sense to load their ContactProfiles as well.  We'll define ContactProfiles, ContactProfile, and PhoneNumber first since they're the same whether we're using JPA or JDO:

public static class ContactProfiles implements Serializable {
    private final
List<ContactProfile> profiles;

    public ContactProfiles(List<ContactProfile> profiles) {
      this.profiles = profiles;
    }

    public
List<ContactProfile> get() {
      return profiles;
    }
}

public class ContactProfile implements Serializable {
    private String profileName;
    private
List<PhoneNumber> phoneNumbers;

    // getters and setters
}

public class PhoneNumber implements Serializable {
    private String type;
    private String number;

    // getters and setters
}


The JPA Person definition:
@Entity
public class Person {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    @Lob
    private ContactProfiles contactProfiles;

    // getters and setters
}


The JDO Person definition:

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")
public class Person { 
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    @Persistent(serialized = "true")
    private ContactProfiles contactProfiles;

    // getters and setters
}


In this example the App Engine JDO/JPA implementation converts the Person.contactProfiles field into a com.google.appengine.api.datastore.Blob using standard Java serialization.  This Blob is then stored as a single property on the Person Entity.  When you fetch a Person the corresponding Entity is retrieved and the Blob is deserialized in just a single datastore rpc, leaving you with a ContactProfiles instance and all its associated sub-objects.

There are a few drawbacks to this approach that you should be aware of.  First, even though your application understands the structure of the bytes that make up the ContactProfiles, the datastore does not.  As a result you can't filter or sort on any of the properties of ContactProfile or PhoneNumber.  Second, all the known gotchas associated with Java serialization apply here.  If you evolve your Serializable classes make sure you do it in a backwards compatible way!  Third, you might be looking at the definition of the ContactProfiles class and thinking "Why are we bothering with a ContactProfiles class when Person could just store List<ContactProfile>?"  The reason is that updating a serialized field is not as simple as updating other types of fields, and this extra layer of indirection makes things a bit easier.  DataNucleus has no way of "seeing" the changes you make to the internal values of a serialized field, and as a result it doesn't know when it needs to flush changes to these fields to the datastore.  However, DataNucleus can see when the top-level serialized field reference changes.  Giving Person a reference to a ContactProfiles object makes it easy for us to change ContactProfiles in a way that DataNucleus is guaranteed to recognize:

public void addContactProfile(Person p, ContactProfile cp) {
    // update our serialized field
    p.getContactProfiles().get().add(cp);
    // give the person a new ContactProfiles reference so the update is detected
    p.setContactProfiles(new ContactProfiles(p.getContactProfiles().get()));
}

The last line where we set a new ContactProfiles instace back on the Person is the key and it's easy to miss, so be careful!  If you forget this step your updates will be ignored and you will be sad.

Despite the limitations I've listed, Serialized fields are a good way to store structured data inside the record of a containing Entity.  As long as you can get by without filtering or sorting on this data and you remember that it has special update requirements, serialized fields will almost certainly come in handy at some point.

Keys Only Queries

(This was originally posted to the Google App Java Google Group on October 6, 2009)

If you use the low-level datastore api you may have noticed that the com.google.appengine.api.datastore.Query class has a setKeysOnly() method.  If you call this method before you execute the query the datastore will return com.google.appengine.api.datastore.Entity instances that have their keys filled in but none of their properties.  This can reduce consumption of your Datastore Data Received from API quota, especially if you've got some large entities, but, more importantly, it can also reduce consumption of your Datastore CPU Time quota.  How?  Well, if the fulfillment of your query requires an index or a merge join your query actually executes in two stages:  First it scans the index to find the keys of the entities that match and then it issues additional scans to retrieve the entities uniquely identified by the matching keys.  If your query is keys-only we can skip that second step entirely.  That means faster queries!

Now, JPA and JDO don't know anything about keys-only queries, but they do give you the flexibility to either return your entire object or some subset of the fields on your object.  If you construct this subset to only contain the primary key of your object, the App Engine implementation of JPA and JDO will use a keys-only query.  Let's look at some examples:

JPA:
@Entity
public class Book {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    private Date dateOfPublication;

    // getters and setters

}


Now let's implement a method that returns the Keys of all Books published betweeen 2 years (we'll assume someone else is creating and closing an EntityManager named 'em' for us):

public List keysOfBooksPublishedBetween(EntityManager em, Date from, Date to) {
    em.getTransaction().begin();
    try {
        Query q = em.createQuery("select id from " + Book.class.getName()
            + " where dateOfPublication >= :from AND dateOfPublication <= :to");
        q.setParameter("from", from);
        q.setParameter("to", to);
        return (List) q.getResultList();
    } finally {
        em.getTransaction().rollback();
    }
}


JDO:

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")
public class Book {
 
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    private Date dateOfPublication;


    // getters and setters
}

Now let's implement a method that returns the Keys of all Books published betweeen 2 years (we'll assume someone else is creating and closing a PersistenceManager named 'pm' for us):


public List keysOfBooksPublishedBetween(PersistenceManager pm, Date from, Date to) {
    pm.currentTransaction().begin();
    try {
        Query q = pm.newQuery("select id from " + Book.class.getName()
            + " where dateOfPublication >= :from && dateOfPublication <= :to");
        return (List) q.execute(from, to);
    } finally {
        pm.currentTransaction().rollback();
    }
}
--------------------------------
Notice how we are only selecting the 'id' field from our Book class.  This is crucial.  If you select any other fields your query will end up fetching entire entities and the optimization will be lost.

Updating A Bidrectional Owned One-To-Many With A New Child

(This was originally posted to the Google App Java Google Group on September 28, 2009)  

All the way back in our first post one we demonstrated how to create both a parent and a child of a bidirectional, owned, one-to-many relationship at the same time.  This week we're going to see how to add a child to an existing parent.  We'll use the same model objects we used before:

JPA:
@Entity
public class Book {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    private String title;


    @OneToMany(mappedBy = "book", cascade = CascadeType.ALL)

    private List<Chapter> chapters = new ArrayList<Chapter>();

    // getters and setters
}

@Entity
public class Chapter {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    private String title;
    private int numPages;

    @ManyToOne(fetch = FetchType.LAZY)
    private Book book;

    // getters and setters
}

Now let's assume we've already created a book with a few chapters in the datastore and we want to add a brand new chapter to a Book with a given id (we'll assume someone else is creating and closing an EntityManager named 'em' for us):

public void addChapterToBook(EntityManager em, Key bookKey, Chapter chapter) {
    em.getTransaction().begin();
    try {
        Book b = em.find(Book.class, bookKey);
        if (b == null) {
            throw new RuntimeException("Book " + bookKey + " not found!");
        }
        b.getChapters().add(chapter);

        em.getTransaction().commit();
    } finally {
        if (em.getTransaction().isActive()) {
            em.getTransaction().rollback();
        }
    }
}


JDO:

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")
public class Book {
 
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    private String title;


    @Persistent(mappedBy = "book")
    @Element(dependent = "true")
    @Order(extensions = @Extension(vendorName="datanucleus", key="list-ordering", value="id asc"))
    private List<Chapter> chapters = new ArrayList<Chapter>();


    // getters and setters
}

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")

public class Chapter {
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    private String title;
    private int numPages;

    @Persistent
    private Book book;

    // getters and setters

}

Now let's assume we've already created a book with a few chapters in the datastore and we want to add a brand new chapter to a Book with a given id (we'll assume someone else is creating and closing a PersistenceManager named 'pm' for us):


public void addChapterToBook(PersistenceManager pm, Key bookKey, Chapter chapter) {
    pm.currentTransaction().begin();
    try {
        // throws a runtime exception if book is not found       
        Book b = pm.getObjectById(Book.class, bookKey);
        b.getChapters().add(chapter);

        pm.currentTransaction().commit();
    } finally {
        if (pm.currentTransaction().isActive()) {
            pm.currentTransaction().rollback();
        }
    }
}
--------------------------------

The interesting thing about both of these examples is that we're not making any explicit calls to save the new Chapter.  We look up the Book identified by the Key that was passed into the function and then we manipulate the persistent state of the object by manipulating the POJO that was returned by em.fetch/pm.getObjectById.  JPA and JDO both have mechanisms that allow them to monitor the objects that you've looked up for changes.  Ever wonder what exactly the enhancer is doing to your classes?  It's adding hooks so that the persistence framework gets notified when things change (among other things).  This allows JPA and JDO to automatically flush your changes to the datastore when you commit your transaction.  If you wanted to modify the title of the Book or the number of pages in an existing Chapter the approach would be exactly the same: Start a transaction, look up the Book, make your changes, commit your transaction.  Whether you're using JPA or JDO your changes will be persisted for you without any explicit calls to change the persistent state.  This is a prime example of how JPA and JDO facilitate "transparent persistence."

Executing Batch Gets

(This was originally posted to the Google App Java Google Group on September 21, 2009) 

Did you know that the App Engine datastore supports batch gets?  Batch gets are a super-efficient way to load multiple entities when you already have the keys of the entities you want loaded.  Here's an example using the low-level Datastore API:

  public Map getById(List keys) {
      DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
      return ds.get(keys);        
  }

Now lets see how we can accomplish the same thing in JPA and JDO:

JPA:
@Entity
public class Book {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key key;
    private String title;


    // additional members, getters and setters
}



public List getById(List keys) {
    Query q = em.createQuery("select from " + Book.class.getName() + " where key = :keys");
    q.setParameter("keys", keys);
    return (List) q.getResultList();
}


JDO:
@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Book {
    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key key;
    private String title;


    // additional members, getters and setters
}


public List getById(List keys) {
    Query q = pm.newQuery("select from " + Book.class.getName() + " where key == :keys");
    return (List) q.execute(keys);
}

Notice how in both examples we're constructing a query to pull back the entities by key.  The App Engine JDO/JPA implementation detects queries that are filtering only by key and fulfills them using a low level batch get rather than a datstore query.  This works no matter the type of your primary key field, so whether you're using a Long, an unencoded String, an encoded String, or a Key, the same technique will work.  However, even though this looks like a query, all the fetch-related transaction restrictions apply: if you're executing your batch get query inside a txn, all of the entities you're attempting to fetch must belong to the same entity group or you'll get an exception. Be careful with this.

The next time you need to pull back multiple entities and you already have their keys, issue a query that filters only by your object's key field and reap the benefits of the datastore's optimized batch get mechanism.

Creating A Bidrectional Owned One-To-Many

(This was originally posted to the Google App Java Google Group on September 14, 2009)

Suppose you're building a book catalog application and you want to model books and chapters.  Books contain chapters.  A chapter cannot exist without a book, so if you delete a book you want its chapters automatically deleted along with it.  You also want to each chapter to have a reference to the book that owns it.  Sounds like a bidrectional, owned, one-to-many relationship is just the thing.  First we'll set up our model objects and then we'll add some code to create a Book with 2 Chapters.

JPA:
import com.google.appengine.api.datastore.Key;
//...

@Entity
public class Book {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    private String title;


    @OneToMany(mappedBy = "book", cascade = CascadeType.ALL)

    private List<Chapter> chapters = new ArrayList<Chapter>();

    // getters and setters
}


@Entity
public class Chapter {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    private String title;
    private int numPages;

    @ManyToOne(fetch = FetchType.LAZY)
    private Book book;

    // getters and setters
}

Now let's create a book with two chapters (we'll assume someone else is creating and closing an EntityManager named 'em' for us):

Book b = new Book();
b.setTitle("JPA 4eva");
Chapter c1 = new Chapter();
c1.setTitle("Intro");
c1.setNumPages(10);
b.getChapters().add(c1);
Chapter c2 = new Chapter();
c2.setTitle("Configuration");
c2.setNumPages(9);
b.getChapters().add(c2);

em.getTransaction().begin();
try {
    em.persist(b);
    em.getTransaction().commit();
} finally {
    if (em.getTransaction().isActive()) {
        em.getTransaction().rollback();
    }
}


JDO:

import com.google.appengine.api.datastore.Key;
//...

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")
public class Book {
 
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    private String title;


    @Persistent(mappedBy = "book")
    @Element(dependent = "true")
    private List<Chapter> chapters = new ArrayList<Chapter>();

    // getters and setters
}

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")

public class Chapter {
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    private String title;
    private int numPages;

    @Persistent
    private Book book;

    // getters and setters

}

Now let's create a book with two chapters (we'll assume someone else is creating and closing a PersistenceManager named 'pm' for us):

Book b = new Book();
b.setTitle("JDO 4eva");
Chapter c1 = new Chapter();
c1.setTitle("Intro");
c1.setNumPages(10);
b.getChapters().add(c1);
Chapter c2 = new Chapter();
c2.setTitle("Configuration");
c2.setNumPages(9);
b.getChapters().add(c2);

pm.currentTransaction().begin();
try {
    pm.makePersistent(b);
    pm.currentTransaction().commit();
} finally {
    if (pm.currentTransaction().isActive()) {
        pm.currentTransaction().rollback();
    }
}

Welcome to the GAE/J Persistence Blog!

Greetings everyone!  My name is Max Ross and I'm an engineer on the Google App Engine team who works on persistence stuff: the datastore, the low-level datastore api, and the JDO/JPA implementation.  I'll be your host.

GAE for Java has been available since early April 2009, and since launch we've fixed a lot of bugs and added a lot of features related to JDO and JPA.  However, I read the GAE Java group regularly and I see the persistence questions being asked, and my impression is that we should be giving you, our customers, more guidance.  This blog is my attempt to provide that additional guidance.

Each week I'm going to post a complete example demonstrating how to accomplish a specific persistence-related task.  We don't play favorites here on the App Engine team so I'll construct the example in both JDO and JPA, and I'll try to cover a wide range of topics to keep everyone interested.  My hope is that, over time, this will come to serve as a valuable resource to all App Engine Java developers whether you're new to JDO/JPA or not.

Thanks for reading!