Wednesday, December 30, 2009

More compact @PersistenceCapable declarations

When I started writing I said I was going to treat JDO and JPA equally in these posts, but I've just learned something so earth-shatteringly helpful that happens to be specific to JDO. What would you have me do? Should I withhold valuable information just because it doesn't have a corresponding representation in JPA? That hardly seems right. How about a compromise: I'm going to share the JDO nugget and I promise to make it up to the JPA users at some point in the future. Sound reasonable? We have a deal? Ok, good. Now hold on to your hats, here we go:

I've always written PersistenceCapable declarations like this:

@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Foo {
    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key key;
    // ...
}

Kinda verbose, right? It turns out that if you omit 'identityType' DataNucleus will still do the right thing so long as you've set your PrimaryKey up properly:

@PersistenceCapable
public class Foo {
    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key key;
    // ...
}

I just saved you 41 characters. Happy new year everyone!

Friday, December 4, 2009

Queries with != and IN filters

Since we launched Google App Engine for Java back in April there has been a sizable disparity between the persistence features of Python and Java: Python had support for the != and IN operators in query filters and Java did not. With the release of SDK 1.2.8 this disparity is now gone (hooray!), but before you go rushing out to take advantage of these new operators I want to take some time to explain how the performance of queries that use these operators is a little bit different from what you're used to.

The key difference is that the datastore does not natively support != and IN. Instead, these operators are implemented in "userland." The code lives in appengine-api.jar, and it knows how to transform queries that use != and IN into queries that the datastore natively supports. Let's look at an example. Suppose you want to issue a query that returns all people not named Max, ordered by name and age.

Under the hood, this query gets broken up into two queries. The first query returns all people whose name is lexicographically smaller than Max, ordered by name and age. The second query returns all people whose name is lexicographically larger than Max, ordered by name and age.

JPA:
select from Person where name <> "Max" order by name, age

is translated into

select from Person where name < "Max" order by name, age
select from Person where name > "Max" order by name, age

JDO:
select from Person where name != "Max" order by name, age

is translated into



select from Person where name < "Max" order by name, age 
select from Person where name > "Max" order by name, age

The results of these two queries are then merged, with the sort applied in-memory as we go. Both result sets are ordered by the same properties, so we can determine the next result to return by comparing the two results at the front of the two result sets. This makes the in-memory sort efficient. We also need to de-dupe as we go because a multi-value property may place an Entity in both result sets. The performance of this query is equivalent to the performance of the two underlying queries with some additional memory consumption because we need to maintain references to all the results we've already returned in order to de-dupe.

IN filters are implemented in similar fashion, but instead of requiring a fixed number of queries to fulfill they require N queries, where N is the number of values in the IN clause.  Suppose you want all people whose favorite foods are cheeseburgers, pizza, and fried chicken, ordered by favoriteFood and age.

Under the hood this query gets broken up into three queries. The first query returns all people whose favorite food is cheeseburgers, ordered by favoriteFood and age. The second query returns all people whose favorite food is pizza, ordered by favoriteFood and age. The third query includes all people whose favorite food is fried chicken, ordered by favoriteFood and age.

JPA:
select from Person where
    favoriteFood IN ('cheeseburger', 'pizza', 'fried chicken')
    order by favoriteFood, age

is translated into

select from Person where favoriteFood = "cheeseburger" order by favoriteFood, age
select from Person where favoriteFood = "pizza" order by favoriteFood, age
select from Person where favoriteFood = "fried chicken" order by favoriteFood, age

JDO:
Query q = pm.newQuery(
    "select from Person where :p1.contains(favoriteFood) order by favoriteFood, age");
q.execute(Arrays.asList("cheeseburger", "pizza", "fried chicken"));

is translated into

select from Person where favoriteFood == "cheeseburger" order by favoriteFood, age
select from Person where favoriteFood == "pizza" order by favoriteFood, age
select from Person where favoriteFood == "fried chicken" order by favoriteFood, age

Once again we merge the results of these three queries, sorting and de-duping as we go. Once again the performance of this query is equivalent to the performance of the underlying queries with some additional memory consumption.

Can you issue a query that combines != and IN? Absolutely. Can you have multiple IN filters? Absolutely again! But, be careful, if your query requires more than 30 underlying queries to fulfill you'll get an exception. Here's a query that requires too many underlying queries to fulfill:

JPA:
select from Person where 
    name <> "Max" AND 
    favoriteFood IN ("cheeseburger", "pizza", "fried chicken") AND
    age IN (10, 11, 12, 13, 14, 15, 16, 17, 18 19) order by name

is translated into

select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 10 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 10 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 11 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 11 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 12 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 12 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 13 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 13 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 14 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 14 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 15 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 15 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 16 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 16 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 17 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 17 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 18 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 18 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 19 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 19 order by name
select from Person where name < "Max" AND favoriteFood = "pizza" AND age = 10 order by name
select from Person where name > "Max" AND favoriteFood = "pizza" AND age = 10 order by name
select from Person where name < "Max" AND favoriteFood = "pizza" AND age = 11 order by name
select from Person where name > "Max" AND favoriteFood = "pizza" AND age = 11 order by name

.... and so on.

JDO:
Query q = pm.newQuery("select from Person where " + 
    "name != 'Max' AND " +
    ":p1.contains(favoriteFood) && :p2.contains(age) order by name");
q.execute(Arrays.asList("cheeseburger", "pizza", "fried chicken"),
    Arrays.asList(10, 11, 12, 13, 14, 15, 16, 17, 18, 19));

is translated into

select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 10 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 10 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 11 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 11 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 12 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 12 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 13 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 13 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 14 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 14 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 15 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 15 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 16 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 16 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 17 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 17 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 18 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 18 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 19 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 19 order by name
select from Person where name < "Max" && favoriteFood == "pizza" AND age == 10 order by name
select from Person where name > "Max" && favoriteFood == "pizza" AND age == 10 order by name
select from Person where name < "Max" && favoriteFood == "pizza" AND age == 11 order by name
select from Person where name > "Max" && favoriteFood == "pizza" AND age == 11 order by name

.... and so on.

One final note that isn't entirely related but that I'll tack on anyway: You can now use the 'OR' operator in your queries as long as the query can be rewritten using IN. That means this now works:

JPA:
select from Person where 
    favoriteFood = "cheeseburger" OR 
    favoriteFood = "pizza" OR 
    favoriteFood = "fried chicken"

JDO:
select from Person where 
    favoriteFood == "cheeseburger" || 
    favoriteFood == "pizza" || 
    favoriteFood == "fried chicken"

But this still doesn't:

JPA:
select from Person where 
    favoriteFood = "cheeseburger" OR age = 10

JDO:
select from Person where 
    favoriteFood == "cheeseburger" || age == 10

Enjoy the new operators!