JPA 2, JDO 3, unowned relationships, and tons of bug fixes.
Tuesday, November 1, 2011
DataNucleus App Engine Plugin v2.0.0-RC1 available for testing
Sunday, March 21, 2010
Executing Simple Joins Across Owned Relationships
As promised, today I'm going to explain how to use simple joins with owned relationships. This requires some additional setup because the physical layout of an owned relationship is not the same as the physical layout of an unowned relationship. Specifically, with an unowned relationship, the keys of the child entities are stored in a property on the parent, whereas with an owned relationship the keys of the child entities are not stored in a property on the parent - ownership is instead modeled using the hierarchical structure of the child key. Simple joins depend on the physical layout we get with unowned relationships, so if we're going to use simple joins with owned relationships we're going to need to change the way we store them.
Fortunately, as of SDK 1.3.1, there is a configuration option that changes the physical layout of owned relationships to match the physical layout of unowned relationships while preserving the hierarchical structure of the child key. Nice! You can enable this option by adding the following to your config:
<property name="datanucleus.appengine.storageVersion" value="WRITE_OWNED_CHILD_KEYS_TO_PARENTS"/>
If you're using JPA, add this to persistence.xml. If you're using JDO, add this to jdoconfig.xml. If you don't have any existing data in production you're ready to move on to the examples. If you do have existing data there's more work to do: data migration. If you're like me then you get sad, then angry, then hungry, then sad again whenever you see those words so I apologize for just throwing them in your face like that, but if you have existing data there's really no way around it. See, setting this config option will cause all newly created and updated parents to be written with a property containing child keys, but all your existing data is still sitting there in the old layout. If you want the results of your simple joins to be accurate (and you do want this, right?) you're going to need to migrate all your existing data. I'll explain how to do this in a later post (note that I said 'a later post' not 'the next post' - I may not get to this for awhile).
Let's go back to our college example. We have Students, Credits, and Honors Thesis. Every Student has zero or more Credits and zero or one Honors Thesis.
JPA:
@Entity
public class Student {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String year;
@OneToMany(cascade = CascadeType.ALL)
private List<Credit> credits;
@OneToOne(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
private HonorsThesis thesis;
}
@Entity
public class Credit {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
private String department;
}
@Entity
public class HonorsThesis {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
private boolean complete;
}
JDO:
@PersistenceCapable(detachable = "true")
public class Student {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Long id;
private String year;
@Element(dependent = "true")
private List<Credit> credits;
@Persistent
private HonorsThesis thesis;
}
@PersistenceCapable(detachable = "true")
public class Credit {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
private String department;
}
@PersistenceCapable(detachable = "true")
public class HonorsThesis {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
private boolean complete;
}
---------------------------------------
Here's a query that returns all Seniors with at least one credit from the Philosophy department:
JPA:
Query q = em.createQuery("select from " + Student.class.getName()
+ " s JOIN s.credits c where "
+ "c.department = 'Philosophy' and "
+ "s.year = 'Senior'");
JDO:
Query q = pm.newQuery("select from " + Student.class.getName()
+ " where "
+ "credits.contains(c) && c.department == 'Philosophy' && "
+ "year == 'Senior'");
q.declareVariables(Credit.class.getName() + " c");
---------------------------------------
And here's a query that returns all Seniors who have completed an Honors Thesis:
JPA:
Query q = em.createQuery("select from " + Student.class.getName()
+ " s JOIN s.thesis t where "
+ "t.complete = true and "
+ "s.year = 'Senior'");
JDO:
Query q = pm.newQuery("select from " + Student.class.getName() + " where "
+ "thesis == t && t.complete == true && "
+ "year == 'Senior'");
Fortunately, as of SDK 1.3.1, there is a configuration option that changes the physical layout of owned relationships to match the physical layout of unowned relationships while preserving the hierarchical structure of the child key. Nice! You can enable this option by adding the following to your config:
<property name="datanucleus.appengine.storageVersion" value="WRITE_OWNED_CHILD_KEYS_TO_PARENTS"/>
If you're using JPA, add this to persistence.xml. If you're using JDO, add this to jdoconfig.xml. If you don't have any existing data in production you're ready to move on to the examples. If you do have existing data there's more work to do: data migration. If you're like me then you get sad, then angry, then hungry, then sad again whenever you see those words so I apologize for just throwing them in your face like that, but if you have existing data there's really no way around it. See, setting this config option will cause all newly created and updated parents to be written with a property containing child keys, but all your existing data is still sitting there in the old layout. If you want the results of your simple joins to be accurate (and you do want this, right?) you're going to need to migrate all your existing data. I'll explain how to do this in a later post (note that I said 'a later post' not 'the next post' - I may not get to this for awhile).
Let's go back to our college example. We have Students, Credits, and Honors Thesis. Every Student has zero or more Credits and zero or one Honors Thesis.
JPA:
@Entity
public class Student {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String year;
@OneToMany(cascade = CascadeType.ALL)
private List<Credit> credits;
@OneToOne(fetch = FetchType.LAZY, cascade = CascadeType.ALL)
private HonorsThesis thesis;
}
@Entity
public class Credit {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
private String department;
}
@Entity
public class HonorsThesis {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
private boolean complete;
}
JDO:
@PersistenceCapable(detachable = "true")
public class Student {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Long id;
private String year;
@Element(dependent = "true")
private List<Credit> credits;
@Persistent
private HonorsThesis thesis;
}
@PersistenceCapable(detachable = "true")
public class Credit {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
private String department;
}
@PersistenceCapable(detachable = "true")
public class HonorsThesis {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
private boolean complete;
}
---------------------------------------
Here's a query that returns all Seniors with at least one credit from the Philosophy department:
JPA:
Query q = em.createQuery("select from " + Student.class.getName()
+ " s JOIN s.credits c where "
+ "c.department = 'Philosophy' and "
+ "s.year = 'Senior'");
JDO:
Query q = pm.newQuery("select from " + Student.class.getName()
+ " where "
+ "credits.contains(c) && c.department == 'Philosophy' && "
+ "year == 'Senior'");
q.declareVariables(Credit.class.getName() + " c");
---------------------------------------
And here's a query that returns all Seniors who have completed an Honors Thesis:
JPA:
Query q = em.createQuery("select from " + Student.class.getName()
+ " s JOIN s.thesis t where "
+ "t.complete = true and "
+ "s.year = 'Senior'");
JDO:
Query q = pm.newQuery("select from " + Student.class.getName() + " where "
+ "thesis == t && t.complete == true && "
+ "year == 'Senior'");
q.declareVariables(HonorsThesis.class.getName() + " t");
---------------------------------------
Unlike the unowned relationship examples we covered in the last post, we're now firmly back in the land of standard JPA and JDO. Good! That's where we want to be. Give it a try and let me know what you think! In the next post, hopefully less than 6 weeks away this time, I'll explain how simple joins work under the hood.
Friday, February 12, 2010
The Truth About Joins
Whenever I get the opportunity to talk to my peers about the coolness of App Engine it always pains me to have to say "...but you can't do joins." I know how useful joins are because I spent many many years using them before I started working on App Engine. Lately I've been thinking that there has to be a way to make it work. The App Engine datastore offers a guarantee that query performance scales with the size of the result set, not the size of the data set, so a query that returns 100 results should take the same amount of time whether you have a thousand rows or a million. That means we can't compute a full cross-product at time-of-query because that would require looking at all the data in the n tables we're joining, and if we look at all the data our performance is now tied to the size of the data set. Still, it feels like we ought to be able to do something here.
What if we set our sights a little bit lower? Sure, n-way joins with arbitrary filters and sort orders would be ideal, but isn't something better than nothing in this area? It turns out that with some additional restrictions in place it is possible to execute joins where the performance scales with the size of the result set, and as of the recently released SDK 1.3.1 this feature, which I'm calling "simple joins," is available to beta testers.
Now, before you start licking your chops let me give you all the bad news (there's plenty of it). First, simple joins are only available to JPA and JDO users. Why? Because JPA and JDO already have an established API for joins and API design is really really hard. I try to avoid it whenever possible. Second, simple joins only support equality filters. Ouch, right? Third, simple joins can only be sorted by the property on which you're joining. Double ouch. Fourth, if you're using owned relationships, in order to use simple joins you're going to need to upgrade your storage version and migrate existing data. This post is already long enough so I'm just going to focus on simple joins across unowned relationships. I'll cover storage versions, data migration, and joining across owned relationships in my next post, I promise.
I know that was a lot of bad news, but my theory is that even with all that bad news, simple joins are sufficiently useful. I'm making them available to try and prove that theory. Ready to try it out?
Let's model a college. We have Students, Courses, and Dorms. Every Student is associated with zero or one Dorm and zero or more Courses.
JPA:
@Entity
public class Student {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String year;
@Basic
private List<Key> courses = new ArrayList<Key>();
@OneToMany
@Column(name = "courses", insertable = false, updatable = false)
private List<Course> coursesAlias;
@Basic
private Key dorm;
@OneToOne(fetch = FetchType.LAZY)
@Column(name = "dorm", insertable = false, updatable = false)
private Dorm dormAlias;
}
@Entity
public class Course {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
private String department;
}
@Entity
public class Dorm {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
private String campus;
}
Yes, there's something funny (and by funny I mean hacky) going on with the courseAlias and dormAlias fields. The JPQL compiler won't let you join on a Key or a List<Key> field because these aren't OneToOne or OneToMany fields. To work around this I've created a read-only alias of the appropriate type. Your application code should always read from and write to the 'courses' and 'dorm' fields while your join queries should read from and write to the 'coursesAlias' and 'dormAlias' fields. Once we have proper support for unowned relationships all this nonsense will go away.
WARNING: There's a bug that causes aliased fields to be written to the datastore with incorrect names. Make sure the names of your aliased fields don't end in "Key" or else your join won't return any results. In the above example you could trigger this bug by renaming Student.dorm to Student.dormKey.
JDO:
@PersistenceCapable(detachable = "true")
public class Student {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Long id;
private String year;
@Element(dependent = "true")
private List<Key> courses = new ArrayList<Key>();
@Persistent
private Dorm dorm;
}
@PersistenceCapable(detachable = "true")
public class Course {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
private String department;
}
@PersistenceCapable(detachable = "true")
public class Dorm {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
private String campus;
}
---------------------------------------
Here's a query that returns all Juniors enrolled in a Biology course:
JPA:
Query q = em.createQuery("select from " + Student.class.getName()
+ " s JOIN s.coursesAlias c where "
+ "c.department = 'Biology' and "
+ "s.year = 'Junior'");
JDO:
Query q = pm.newQuery("select from " + Student.class.getName()
+ " where "
+ "courses.contains(c) && c.department == 'Biology' && "
+ "year == 'Junior'");
q.declareVariables(Course.class.getName() + " c");
It's definitely strange to be referring to non-existent properties of a Key field and a List<Key> field, but since proper unowned relationships are not yet supported I eased up some of the JDOQL compiler's type checking to let this go through (sorry Andy, I know you hate this). Non-standard!
---------------------------------------
And here's a query that returns all Juniors who live in a dorm on the North campus of the school:
JPA:
Query q = em.createQuery("select from " + Student.class.getName()
+ " s JOIN s.dormAlias d where "
+ "d.campus = 'North' and "
+ "s.year = 'Junior'");
JDO:
Query q = pm.newQuery("select from " + Student.class.getName() + " where "
+ "dorm == d && d.campus == 'North' && "
+ "year == 'Junior'");
---------------------------------------
These examples should be enough to get you started. Currently, only 2-way joins are supported but there's no reason it can't be extended to n-way. As for outer joins, arbitrary sorts, and inequality filters, my colleagues and I are still thinking hard about it.
Stay tuned for more posts in which I'll explain how to make this work for owned relationships and the underlying algorithm involved. Give simple joins a try and let me know how it goes!
What if we set our sights a little bit lower? Sure, n-way joins with arbitrary filters and sort orders would be ideal, but isn't something better than nothing in this area? It turns out that with some additional restrictions in place it is possible to execute joins where the performance scales with the size of the result set, and as of the recently released SDK 1.3.1 this feature, which I'm calling "simple joins," is available to beta testers.
Now, before you start licking your chops let me give you all the bad news (there's plenty of it). First, simple joins are only available to JPA and JDO users. Why? Because JPA and JDO already have an established API for joins and API design is really really hard. I try to avoid it whenever possible. Second, simple joins only support equality filters. Ouch, right? Third, simple joins can only be sorted by the property on which you're joining. Double ouch. Fourth, if you're using owned relationships, in order to use simple joins you're going to need to upgrade your storage version and migrate existing data. This post is already long enough so I'm just going to focus on simple joins across unowned relationships. I'll cover storage versions, data migration, and joining across owned relationships in my next post, I promise.
I know that was a lot of bad news, but my theory is that even with all that bad news, simple joins are sufficiently useful. I'm making them available to try and prove that theory. Ready to try it out?
Let's model a college. We have Students, Courses, and Dorms. Every Student is associated with zero or one Dorm and zero or more Courses.
JPA:
@Entity
public class Student {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String year;
@Basic
private List<Key> courses = new ArrayList<Key>();
@OneToMany
@Column(name = "courses", insertable = false, updatable = false)
private List<Course> coursesAlias;
@Basic
private Key dorm;
@OneToOne(fetch = FetchType.LAZY)
@Column(name = "dorm", insertable = false, updatable = false)
private Dorm dormAlias;
}
@Entity
public class Course {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
private String department;
}
@Entity
public class Dorm {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
private String campus;
}
Yes, there's something funny (and by funny I mean hacky) going on with the courseAlias and dormAlias fields. The JPQL compiler won't let you join on a Key or a List<Key> field because these aren't OneToOne or OneToMany fields. To work around this I've created a read-only alias of the appropriate type. Your application code should always read from and write to the 'courses' and 'dorm' fields while your join queries should read from and write to the 'coursesAlias' and 'dormAlias' fields. Once we have proper support for unowned relationships all this nonsense will go away.
WARNING: There's a bug that causes aliased fields to be written to the datastore with incorrect names. Make sure the names of your aliased fields don't end in "Key" or else your join won't return any results. In the above example you could trigger this bug by renaming Student.dorm to Student.dormKey.
JDO:
@PersistenceCapable(detachable = "true")
public class Student {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Long id;
private String year;
@Element(dependent = "true")
private List<Key> courses = new ArrayList<Key>();
@Persistent
private Dorm dorm;
}
@PersistenceCapable(detachable = "true")
public class Course {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
private String department;
}
@PersistenceCapable(detachable = "true")
public class Dorm {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
private String campus;
}
---------------------------------------
Here's a query that returns all Juniors enrolled in a Biology course:
JPA:
Query q = em.createQuery("select from " + Student.class.getName()
+ " s JOIN s.coursesAlias c where "
+ "c.department = 'Biology' and "
+ "s.year = 'Junior'");
JDO:
Query q = pm.newQuery("select from " + Student.class.getName()
+ " where "
+ "courses.contains(c) && c.department == 'Biology' && "
+ "year == 'Junior'");
q.declareVariables(Course.class.getName() + " c");
It's definitely strange to be referring to non-existent properties of a Key field and a List<Key> field, but since proper unowned relationships are not yet supported I eased up some of the JDOQL compiler's type checking to let this go through (sorry Andy, I know you hate this). Non-standard!
---------------------------------------
And here's a query that returns all Juniors who live in a dorm on the North campus of the school:
JPA:
Query q = em.createQuery("select from " + Student.class.getName()
+ " s JOIN s.dormAlias d where "
+ "d.campus = 'North' and "
+ "s.year = 'Junior'");
JDO:
Query q = pm.newQuery("select from " + Student.class.getName() + " where "
+ "dorm == d && d.campus == 'North' && "
+ "year == 'Junior'");
q.declareVariables(Dorm.class.getName() + " d");
Again, I'm taking some liberties with standard JDOQL here.---------------------------------------
These examples should be enough to get you started. Currently, only 2-way joins are supported but there's no reason it can't be extended to n-way. As for outer joins, arbitrary sorts, and inequality filters, my colleagues and I are still thinking hard about it.
Stay tuned for more posts in which I'll explain how to make this work for owned relationships and the underlying algorithm involved. Give simple joins a try and let me know how it goes!
Friday, January 15, 2010
Querying with Key parameters
Today we're going to cover a small 'gotcha' that I've seen on the forums a number of times: How do you execute a query where one or more of the parameters are of type com.google.appengine.datastore.api.Key?
Did you know that a Key has two different string representations? You can see the first by calling Key.toString():
Key key = KeyFactory.createKey("Foo", 25);
System.out.println(key); // prints Foo(25)
You can see the second by calling KeyFactory.keyToString():
Key key = KeyFactory.createKey("Foo", 25);
System.out.println(KeyFactory.keyToString(key)); // prints agR0ZXN0cgkLEgNGb28YGQw
The first string is human readable, just like the javadoc for Object.toString() recommends. The second string is websafe (no escaping required) and easily parsed by a machine (it's just the Key object translated to a protocol buffer and then base-64 encoded). Seems reasonable, right? Key.toString() outputs a string that is useful in logging and debugging, KeyFactory.keyToString() outputs a string that you can safely pass around in forms and urls. Plus for those of you who, like me, care about symmetry in your apis, KeyFactory.keyToString() and KeyFactory.stringToKey() are symmetrical. If Key.toString() returned the websafe version, what would the opposite transform look like? Probably a static method on Key called fromString(), right? So now you have mirror methods but one requires an instance and the other is static. I don't like it but I digress. You're probably reading this because you'd like to learn something useful so let's move along.
The problem with having two string representations of the Key class stems from the fact that for a number of data types it's easy to "inline" the parameter value. Take this query for example:
JPA:
JPA:
public List<Flight> flightsByAircraftType(EntityManager em, String aircraftType) {
String queryStr = String.format("select from " + Flight.class.getName() +
" where aircraftType = '%s'", aircraftType);
" where aircraftType = '%s'", aircraftType);
return em.createQuery(queryStr).getResultList();
}
JDO:
public List<Flight> flightsByAircraftType(PersistenceManager pm, String aircraftType) {
String queryStr = String.format("select from " + Flight.class.getName() +
" where aircraftType == '%s'", aircraftType);
" where aircraftType == '%s'", aircraftType);
return (List<Flight>) pm.newQuery(queryStr).execute();
}
}
Concise, readable, and correct. But now let's do the same thing with a Key parameter:
JPA:
public List<Flight> flightPaginationByDest(EntityManager em, Key dest1, Key dest2) {
String queryStr = String.format("select from " + Flight.class.getName() +
" where dest > '%s' and dest <= '%s'", dest1, dest2);
" where dest > '%s' and dest <= '%s'", dest1, dest2);
return em.createQuery(queryStr).getResultList();
}
JDO:
public List<Flight> flightPaginationByDest(PersistenceManager pm, Key dest1, Key dest2) {
String queryStr = String.format("select from " + Flight.class.getName() +
" where dest > '%s' && dest <= '%s'", dest1, dest2);
return (List<Flight>) pm.newQuery(queryStr).execute();
}
Concise, readable, but incorrect. We just asked the datastore for all Flights departing from airports between "Airport(15)" and "Airport(30)", and that's not going to match anything, ever, because we passed a String parameter to a query on entities where every single "dest" property is of type Key. Remember, the datastore is schema-less. If we were using the low-level api you could create some flight entities where the "dest" property is of type String and some flight entities where the "dest" property is of type Key. The datastore has no problem with that.String queryStr = String.format("select from " + Flight.class.getName() +
" where dest > '%s' && dest <= '%s'", dest1, dest2);
return (List<Flight>) pm.newQuery(queryStr).execute();
}
So what's the solution? Ultimately the query compiler should detect this, and in future releases you can expect to get an exception if you attempt to match an unencoded String value against a Key. Until this is implemented I'd recommend using parameters:
JPA:
public List<Flight> flightPaginationByDest(EntityManager em, Key dest1, Key dest2) {
String queryStr = "select from " + Flight.class.getName() +
" where dest > :p1 and dest <= :p2";
Query q = em.createQuery(queryStr);
q.setParameter("p1", dest1);
q.setParameter("p2", dest2);
return q.getResultList();
}
String queryStr = "select from " + Flight.class.getName() +
" where dest > :p1 and dest <= :p2";
Query q = em.createQuery(queryStr);
q.setParameter("p1", dest1);
q.setParameter("p2", dest2);
return q.getResultList();
}
JDO:
public List<Flight> flightPaginationByDest(PersistenceManager pm, Key dest1, Key dest2) {
String queryStr = "select from " + Flight.class.getName() +
" where dest > :p1 && dest <= :p2";
return (List<Flight>) pm.newQuery(queryStr).execute(dest1, dest2);
}
Wednesday, December 30, 2009
More compact @PersistenceCapable declarations
When I started writing I said I was going to treat JDO and JPA equally in these posts, but I've just learned something so earth-shatteringly helpful that happens to be specific to JDO. What would you have me do? Should I withhold valuable information just because it doesn't have a corresponding representation in JPA? That hardly seems right. How about a compromise: I'm going to share the JDO nugget and I promise to make it up to the JPA users at some point in the future. Sound reasonable? We have a deal? Ok, good. Now hold on to your hats, here we go:
I've always written PersistenceCapable declarations like this:
@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Foo {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
// ...
}
Kinda verbose, right? It turns out that if you omit 'identityType' DataNucleus will still do the right thing so long as you've set your PrimaryKey up properly:
@PersistenceCapable
public class Foo {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
// ...
}
I just saved you 41 characters. Happy new year everyone!
I've always written PersistenceCapable declarations like this:
@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Foo {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
// ...
}
Kinda verbose, right? It turns out that if you omit 'identityType' DataNucleus will still do the right thing so long as you've set your PrimaryKey up properly:
@PersistenceCapable
public class Foo {
@PrimaryKey
@Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
// ...
}
I just saved you 41 characters. Happy new year everyone!
Friday, December 4, 2009
Queries with != and IN filters
Since we launched Google App Engine for Java back in April there has been a sizable disparity between the persistence features of Python and Java: Python had support for the != and IN operators in query filters and Java did not. With the release of SDK 1.2.8 this disparity is now gone (hooray!), but before you go rushing out to take advantage of these new operators I want to take some time to explain how the performance of queries that use these operators is a little bit different from what you're used to.
The key difference is that the datastore does not natively support != and IN. Instead, these operators are implemented in "userland." The code lives in appengine-api.jar, and it knows how to transform queries that use != and IN into queries that the datastore natively supports. Let's look at an example. Suppose you want to issue a query that returns all people not named Max, ordered by name and age.
Under the hood, this query gets broken up into two queries. The first query returns all people whose name is lexicographically smaller than Max, ordered by name and age. The second query returns all people whose name is lexicographically larger than Max, ordered by name and age.
JPA:
select from Person where name <> "Max" order by name, age
is translated into
select from Person where name < "Max" order by name, age
select from Person where name > "Max" order by name, age
JDO:
select from Person where name != "Max" order by name, age
is translated into
select from Person where name < "Max" order by name, age
select from Person where name > "Max" order by name, age
The results of these two queries are then merged, with the sort applied in-memory as we go. Both result sets are ordered by the same properties, so we can determine the next result to return by comparing the two results at the front of the two result sets. This makes the in-memory sort efficient. We also need to de-dupe as we go because a multi-value property may place an Entity in both result sets. The performance of this query is equivalent to the performance of the two underlying queries with some additional memory consumption because we need to maintain references to all the results we've already returned in order to de-dupe.
IN filters are implemented in similar fashion, but instead of requiring a fixed number of queries to fulfill they require N queries, where N is the number of values in the IN clause. Suppose you want all people whose favorite foods are cheeseburgers, pizza, and fried chicken, ordered by favoriteFood and age.
Under the hood this query gets broken up into three queries. The first query returns all people whose favorite food is cheeseburgers, ordered by favoriteFood and age. The second query returns all people whose favorite food is pizza, ordered by favoriteFood and age. The third query includes all people whose favorite food is fried chicken, ordered by favoriteFood and age.
JPA:
select from Person where
favoriteFood IN ('cheeseburger', 'pizza', 'fried chicken')
order by favoriteFood, age
favoriteFood IN ('cheeseburger', 'pizza', 'fried chicken')
order by favoriteFood, age
is translated into
select from Person where favoriteFood = "cheeseburger" order by favoriteFood, age
select from Person where favoriteFood = "pizza" order by favoriteFood, age
select from Person where favoriteFood = "fried chicken" order by favoriteFood, age
JDO:
Query q = pm.newQuery(
"select from Person where :p1.contains(favoriteFood) order by favoriteFood, age");
"select from Person where :p1.contains(favoriteFood) order by favoriteFood, age");
q.execute(Arrays.asList("cheeseburger", "pizza", "fried chicken"));
is translated into
select from Person where favoriteFood == "cheeseburger" order by favoriteFood, age
select from Person where favoriteFood == "pizza" order by favoriteFood, age
select from Person where favoriteFood == "fried chicken" order by favoriteFood, age
Once again we merge the results of these three queries, sorting and de-duping as we go. Once again the performance of this query is equivalent to the performance of the underlying queries with some additional memory consumption.
Can you issue a query that combines != and IN? Absolutely. Can you have multiple IN filters? Absolutely again! But, be careful, if your query requires more than 30 underlying queries to fulfill you'll get an exception. Here's a query that requires too many underlying queries to fulfill:
JPA:
select from Person where
name <> "Max" AND
favoriteFood IN ("cheeseburger", "pizza", "fried chicken") AND
age IN (10, 11, 12, 13, 14, 15, 16, 17, 18 19) order by name
name <> "Max" AND
favoriteFood IN ("cheeseburger", "pizza", "fried chicken") AND
age IN (10, 11, 12, 13, 14, 15, 16, 17, 18 19) order by name
is translated into
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 10 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 10 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 11 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 11 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 12 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 12 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 13 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 13 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 14 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 14 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 15 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 15 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 16 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 16 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 17 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 17 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 18 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 18 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 19 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 19 order by name
select from Person where name > "Max" AND favoriteFood = "pizza" AND age = 10 order by name
select from Person where name < "Max" AND favoriteFood = "pizza" AND age = 11 order by name
select from Person where name > "Max" AND favoriteFood = "pizza" AND age = 11 order by name
.... and so on.
JDO:
Query q = pm.newQuery("select from Person where " +
"name != 'Max' AND " +
":p1.contains(favoriteFood) && :p2.contains(age) order by name");
"name != 'Max' AND " +
":p1.contains(favoriteFood) && :p2.contains(age) order by name");
q.execute(Arrays.asList("cheeseburger", "pizza", "fried chicken"),
Arrays.asList(10, 11, 12, 13, 14, 15, 16, 17, 18, 19));
Arrays.asList(10, 11, 12, 13, 14, 15, 16, 17, 18, 19));
is translated into
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 10 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 10 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 11 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 11 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 12 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 12 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 13 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 13 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 14 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 14 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 15 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 15 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 16 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 16 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 17 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 17 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 18 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 18 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 19 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 19 order by name
select from Person where name > "Max" && favoriteFood == "pizza" AND age == 10 order by name
select from Person where name < "Max" && favoriteFood == "pizza" AND age == 11 order by name
select from Person where name > "Max" && favoriteFood == "pizza" AND age == 11 order by name
.... and so on.
One final note that isn't entirely related but that I'll tack on anyway: You can now use the 'OR' operator in your queries as long as the query can be rewritten using IN. That means this now works:
JPA:
select from Person where
favoriteFood = "cheeseburger" OR
favoriteFood = "pizza" OR
favoriteFood = "fried chicken"
favoriteFood = "cheeseburger" OR
favoriteFood = "pizza" OR
favoriteFood = "fried chicken"
JDO:
select from Person where
favoriteFood == "cheeseburger" ||
favoriteFood == "pizza" ||
favoriteFood == "fried chicken"
favoriteFood == "cheeseburger" ||
favoriteFood == "pizza" ||
favoriteFood == "fried chicken"
But this still doesn't:
JPA:
select from Person where
favoriteFood = "cheeseburger" OR age = 10
favoriteFood = "cheeseburger" OR age = 10
JDO:
select from Person where
favoriteFood == "cheeseburger" || age == 10
favoriteFood == "cheeseburger" || age == 10
Enjoy the new operators!
Tuesday, November 24, 2009
Prerelease Of The Next SDK
Hi all,
Just a quick post to let you know that we've made a prerelease of SDK 1.2.8 available at http://code.google.com/p/googleappengine/downloads/list. This is for local development only, but should still give you an opportunity to try out our new features and let us know if anything looks amiss. Persistence-related features of note:
- Support for inheritance in JDO and JPA (finally!)
- Support for != filters in the low level api, JDO, and JPA
- Support for IN filters in the low level api, JDO, and JPA
All the gory details can be found in the release notes of the download.
Please give it a try!
Just a quick post to let you know that we've made a prerelease of SDK 1.2.8 available at http://code.google.com/p/googleappengine/downloads/list. This is for local development only, but should still give you an opportunity to try out our new features and let us know if anything looks amiss. Persistence-related features of note:
- Support for inheritance in JDO and JPA (finally!)
- Support for != filters in the low level api, JDO, and JPA
- Support for IN filters in the low level api, JDO, and JPA
All the gory details can be found in the release notes of the download.
Please give it a try!
Subscribe to:
Posts (Atom)