Thursday, October 15, 2009

Executing Batch Gets

(This was originally posted to the Google App Java Google Group on September 21, 2009) 

Did you know that the App Engine datastore supports batch gets?  Batch gets are a super-efficient way to load multiple entities when you already have the keys of the entities you want loaded.  Here's an example using the low-level Datastore API:

  public Map getById(List keys) {
      DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
      return ds.get(keys);        
  }

Now lets see how we can accomplish the same thing in JPA and JDO:

JPA:
@Entity
public class Book {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key key;
    private String title;


    // additional members, getters and setters
}



public List getById(List keys) {
    Query q = em.createQuery("select from " + Book.class.getName() + " where key = :keys");
    q.setParameter("keys", keys);
    return (List) q.getResultList();
}


JDO:
@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Book {
    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key key;
    private String title;


    // additional members, getters and setters
}


public List getById(List keys) {
    Query q = pm.newQuery("select from " + Book.class.getName() + " where key == :keys");
    return (List) q.execute(keys);
}

Notice how in both examples we're constructing a query to pull back the entities by key.  The App Engine JDO/JPA implementation detects queries that are filtering only by key and fulfills them using a low level batch get rather than a datstore query.  This works no matter the type of your primary key field, so whether you're using a Long, an unencoded String, an encoded String, or a Key, the same technique will work.  However, even though this looks like a query, all the fetch-related transaction restrictions apply: if you're executing your batch get query inside a txn, all of the entities you're attempting to fetch must belong to the same entity group or you'll get an exception. Be careful with this.

The next time you need to pull back multiple entities and you already have their keys, issue a query that filters only by your object's key field and reap the benefits of the datastore's optimized batch get mechanism.

18 comments:

  1. JDOQL syntax is incorrect in this example

    http://code.google.com/p/datanucleus-appengine/issues/detail?id=149&colspec=ID%20Stars%20Type%20Status%20Priority%20FoundIn%20TargetRelease%20Owner%20Summary

    ReplyDelete
  2. The syntax is not standard JDOQL but it compiles and behaves the way I've described it on App Engine. In the next SDK (1.2.8) you'll be able to do this with standard syntax:

    Query q = pm.newQuery("select from " + Book.class.getName() + " where :keys.contains(key));

    ReplyDelete
  3. It doesn't compile in current DataNucleus SVN (nor should it) so good that you're going to support proper syntax in the next release.

    ReplyDelete
  4. It was a pleasant surprise to find out that delete (with JPA) also works in batch mode.

    ReplyDelete
  5. i able to query with
    Query q = em.createQuery("select c from com.test.Usertest c"); but when i try
    Query q = em.createQuery("select c from com.test.Usertest c where c.username = 'god'");
    i getResultList().size()==0 . can we query by entity's property? can we use 'select like'

    ReplyDelete
  6. You can query by an entity property like 'username' but this will execute as a normal query, not a batch get. Queries that only filter on the primary key field are the only ones that will execute as batch gets. Do you definitely have an entity in your datastore with a username property equal to 'god'?

    'select like' does not work. There has been a lot of discussion on the GAE Java group on this topic so I'd recommend reading about it there:

    http://groups.google.com/group/google-appengine-java

    ReplyDelete
  7. I'm curious— would calling pm.getObjectsById(keys), where 'keys' holds a collection of ids, work the same as executing this query?
    Is one approach more efficient than the other?

    ReplyDelete
  8. Right now getObjectsById() doesn't work at all but that's something we can probably address in a future release. If this feature is important to you please file an issue.

    http://code.google.com/p/datanucleus-appengine/issues/list

    Thanks!

    Max

    ReplyDelete
  9. "if you're executing your batch get query inside a txn, all of the entities you're attempting to fetch must belong to the same entity group or you'll get an exception. Be careful with this."

    I have performed a similar query in a transaction on many Books, where each Book is in it's own Entity Group, and it seems to work OK.

    Is it just when you try to modify the entities returned you would get the error?

    How else can you return all entities of kind Book, when they are not all in the same Entity Group?

    thanks
    Peter

    ReplyDelete
  10. If you wanted all entities of a certain kind you'd probably issue a standard (non batch get) query. That's fine because non batch get, non-ancestor queries are not transactional and therefore incapable of raising an exception due to multiple entity groups. If you have the keys of all the entities you want you can issue a query like the one in the post to fetch them, but you can only issue that query inside a transaction if all the keys belong to the same entity group.

    I'm curious to know what your code looks like because the following test fails for me with an IllegalArgumentException due to multiple entity groups in a single transaction:

    List keys = Utils.newArrayList();
    for (int i = 0; i < 10; i++) {
    Flight f = new Flight();
    beginTxn();
    pm.makePersistent(f);
    commitTxn();
    keys.add(f.getId());
    }

    beginTxn();
    Query q = pm.newQuery("select from " + Flight.class.getName() + " where id == :keys");
    List flights = (List) q.execute(keys);
    flights.size();
    commitTxn();

    I'm guessing you're doing something that causes the query to issue as a regular query rather than a batch get.

    ReplyDelete
  11. code is:

    for (int i = 0; i < 10; i++) {
    Book b = new Book();
    beginTxn();
    em.persist(b);
    commitTxn();
    }

    beginTxn();
    Query q = em.newQuery("select from " + Book.class.getName());
    List books = query.getResultList();
    books.size();
    commitTxn();

    so is the beginTxn(); and commitTxn(); redundant in this case?
    because the standard query (select from " + Book.class.getName())
    ignores the surrounding transaction?

    ReplyDelete
  12. Correct, the begin/commit statements don't impact the results of this query in any way.

    ReplyDelete
  13. However,

    Without using a transaction, I found that owned child entities were not being returned with the parent.
    eg.


    Book b = new Book();
    b.getPages().add(new Page());
    em.persist(b);

    Query q = em.newQuery("select from " + Book.class.getName());
    List books = query.getResultList();
    books.size();
    assertEquals(1, book.getPages().size())

    this assert fails as the book.getPages().size() is empty (size 0)

    The way I got this to work was to wrap the query with a transaction.
    Why does this seem to need a transaction specified to have child entites populated?

    ReplyDelete
  14. Calling em.persist(b) does not actually save the book or its children. If you're using transactions the objects are saved when you commit(). If you're not using transactions the objects are saved when you close the EntityManager. Your code above looks like it was put together from a few different sources (em does not have a newQuery method, book variable is undeclared) so I'm not positive what you're doing, but I do know that since you're not using transactions you shouldn't expect to see the book or its pages come back in a query until you've closed the EntityManager.

    ReplyDelete
  15. "...if you're executing your batch get query inside a txn, all of the entities you're attempting to fetch must belong to the same entity group or you'll get an exception."

    Is there any way to perform batch gets (using Keys) for entities of the same kind but belong to different entity groups?

    ReplyDelete
  16. Is there a way to 'batch put'? I know that JDO supports makePersistentAll() but what about JPA?

    ReplyDelete
  17. I want to retrieve key that doesn't contains in key list. I got error like javax.jdo.JDOFatalUserException: Batch lookup by primary key is only supported with the equality operator.
    at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:354)

    ReplyDelete
  18. The syntax described above for some reason didn't work for me, I got NullPointerException. However this approach worked fine: http://stackoverflow.com/questions/10858197/appengine-jdo-batch-get-query

    ReplyDelete