Friday, December 4, 2009

Queries with != and IN filters

Since we launched Google App Engine for Java back in April there has been a sizable disparity between the persistence features of Python and Java: Python had support for the != and IN operators in query filters and Java did not. With the release of SDK 1.2.8 this disparity is now gone (hooray!), but before you go rushing out to take advantage of these new operators I want to take some time to explain how the performance of queries that use these operators is a little bit different from what you're used to.

The key difference is that the datastore does not natively support != and IN. Instead, these operators are implemented in "userland." The code lives in appengine-api.jar, and it knows how to transform queries that use != and IN into queries that the datastore natively supports. Let's look at an example. Suppose you want to issue a query that returns all people not named Max, ordered by name and age.

Under the hood, this query gets broken up into two queries. The first query returns all people whose name is lexicographically smaller than Max, ordered by name and age. The second query returns all people whose name is lexicographically larger than Max, ordered by name and age.

JPA:
select from Person where name <> "Max" order by name, age

is translated into

select from Person where name < "Max" order by name, age
select from Person where name > "Max" order by name, age

JDO:
select from Person where name != "Max" order by name, age

is translated into



select from Person where name < "Max" order by name, age 
select from Person where name > "Max" order by name, age

The results of these two queries are then merged, with the sort applied in-memory as we go. Both result sets are ordered by the same properties, so we can determine the next result to return by comparing the two results at the front of the two result sets. This makes the in-memory sort efficient. We also need to de-dupe as we go because a multi-value property may place an Entity in both result sets. The performance of this query is equivalent to the performance of the two underlying queries with some additional memory consumption because we need to maintain references to all the results we've already returned in order to de-dupe.

IN filters are implemented in similar fashion, but instead of requiring a fixed number of queries to fulfill they require N queries, where N is the number of values in the IN clause.  Suppose you want all people whose favorite foods are cheeseburgers, pizza, and fried chicken, ordered by favoriteFood and age.

Under the hood this query gets broken up into three queries. The first query returns all people whose favorite food is cheeseburgers, ordered by favoriteFood and age. The second query returns all people whose favorite food is pizza, ordered by favoriteFood and age. The third query includes all people whose favorite food is fried chicken, ordered by favoriteFood and age.

JPA:
select from Person where
    favoriteFood IN ('cheeseburger', 'pizza', 'fried chicken')
    order by favoriteFood, age

is translated into

select from Person where favoriteFood = "cheeseburger" order by favoriteFood, age
select from Person where favoriteFood = "pizza" order by favoriteFood, age
select from Person where favoriteFood = "fried chicken" order by favoriteFood, age

JDO:
Query q = pm.newQuery(
    "select from Person where :p1.contains(favoriteFood) order by favoriteFood, age");
q.execute(Arrays.asList("cheeseburger", "pizza", "fried chicken"));

is translated into

select from Person where favoriteFood == "cheeseburger" order by favoriteFood, age
select from Person where favoriteFood == "pizza" order by favoriteFood, age
select from Person where favoriteFood == "fried chicken" order by favoriteFood, age

Once again we merge the results of these three queries, sorting and de-duping as we go. Once again the performance of this query is equivalent to the performance of the underlying queries with some additional memory consumption.

Can you issue a query that combines != and IN? Absolutely. Can you have multiple IN filters? Absolutely again! But, be careful, if your query requires more than 30 underlying queries to fulfill you'll get an exception. Here's a query that requires too many underlying queries to fulfill:

JPA:
select from Person where 
    name <> "Max" AND 
    favoriteFood IN ("cheeseburger", "pizza", "fried chicken") AND
    age IN (10, 11, 12, 13, 14, 15, 16, 17, 18 19) order by name

is translated into

select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 10 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 10 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 11 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 11 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 12 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 12 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 13 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 13 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 14 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 14 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 15 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 15 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 16 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 16 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 17 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 17 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 18 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 18 order by name
select from Person where name < "Max" AND favoriteFood = "cheeseburger" AND age = 19 order by name
select from Person where name > "Max" AND favoriteFood = "cheeseburger" AND age = 19 order by name
select from Person where name < "Max" AND favoriteFood = "pizza" AND age = 10 order by name
select from Person where name > "Max" AND favoriteFood = "pizza" AND age = 10 order by name
select from Person where name < "Max" AND favoriteFood = "pizza" AND age = 11 order by name
select from Person where name > "Max" AND favoriteFood = "pizza" AND age = 11 order by name

.... and so on.

JDO:
Query q = pm.newQuery("select from Person where " + 
    "name != 'Max' AND " +
    ":p1.contains(favoriteFood) && :p2.contains(age) order by name");
q.execute(Arrays.asList("cheeseburger", "pizza", "fried chicken"),
    Arrays.asList(10, 11, 12, 13, 14, 15, 16, 17, 18, 19));

is translated into

select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 10 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 10 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 11 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 11 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 12 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 12 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 13 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 13 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 14 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 14 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 15 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 15 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 16 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 16 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 17 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 17 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 18 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 18 order by name
select from Person where name < "Max" && favoriteFood == "cheeseburger" AND age == 19 order by name
select from Person where name > "Max" && favoriteFood == "cheeseburger" AND age == 19 order by name
select from Person where name < "Max" && favoriteFood == "pizza" AND age == 10 order by name
select from Person where name > "Max" && favoriteFood == "pizza" AND age == 10 order by name
select from Person where name < "Max" && favoriteFood == "pizza" AND age == 11 order by name
select from Person where name > "Max" && favoriteFood == "pizza" AND age == 11 order by name

.... and so on.

One final note that isn't entirely related but that I'll tack on anyway: You can now use the 'OR' operator in your queries as long as the query can be rewritten using IN. That means this now works:

JPA:
select from Person where 
    favoriteFood = "cheeseburger" OR 
    favoriteFood = "pizza" OR 
    favoriteFood = "fried chicken"

JDO:
select from Person where 
    favoriteFood == "cheeseburger" || 
    favoriteFood == "pizza" || 
    favoriteFood == "fried chicken"

But this still doesn't:

JPA:
select from Person where 
    favoriteFood = "cheeseburger" OR age = 10

JDO:
select from Person where 
    favoriteFood == "cheeseburger" || age == 10

Enjoy the new operators!

38 comments:

  1. I would still love to see also examples using the low-level datastore API, that would really be handy!

    ReplyDelete
  2. The low-level api has far less surface area than JDO and JPA so I would expect the javadoc to be sufficient for most things, but if you have questions about how to accomplish specific tasks, please ask and I'll answer as best I can.

    We have plans to document the low-level api alongside JDO and JPA in the official docs but this will take some time.

    ReplyDelete
  3. Do you think it is a good idea to simulate a capabilities, which are not supported by the underlying engine? It might cause creation of inefficient applications, because developers will be lured into the sense that the capabilities like IN do present and should be used whenever needed.

    ReplyDelete
  4. In general, no, we don't simulate capabilities that are not supported by the underlying engine. That's why we don't support aggregate functions, joins, etc. The rule of thumb is that we only support operations that scale with the size of the result set (as opposed to the size of the data set). The implementation of != and IN that I've described in this post meets that criteria. These queries are more expensive to fulfill by a factor related to the query itself but still tied to the size of the result set. So, in this case, I think the exception is warranted.

    ReplyDelete
  5. A very simple alternative to the latter question is to make available access to the in-memory query evaluator. This would then allow people to get the results they want for the query they supply, and they have specified a query extension to enable it ... hence know that things may be done non-optimally. Their query would then be portable to other JDO/JPA implementations too .... this is actually important to some people. This alternative was raised back in April (2009) as an issue, and is NOT a lot of work to enable.

    ReplyDelete
  6. may i know is there hard limit when using "IN" when doing jpa/jdo query?

    ReplyDelete
  7. i mean, how long the "select in" statement.

    ReplyDelete
  8. JPA and JDO don't impose any limitations, but if the query requires more than 30 underlying queries you'll get an exception. This is explained in more detail in the post.

    ReplyDelete
  9. thanks @Max ross . can you comment on socialwork presentation below ?

    DatastoreService dataSvc =...;

    Query query = new Query("MessageRecipients")
    .addFilter("recipients"), FilterOperator.EQUAL,userid)
    .addSort("date", SortDirection.DESCENDING)
    .setKeysOnly(); // <-- Only fetch keys!


    //*** from the presentation , mentioned the above can only be done
    using low level api, arent jdo/jpa can get the key only value as well
    //***http://gae-java-persistence.blogspot.com/2009/10/keys-only-
    queries.html


    List msgRecpts = dataSvc.prepare(query).asList();
    List parents = new ArrayList();
    for(Entity recep : msgREcpts) {
    parents.add(recep.getParent());



    }


    //Bulk fetch parents using key list
    Map msg = dataSvc.get(parents);

    //***for second part above, what is the different from doing directly
    "select in" statement in jpa/jdo ?
    //***http://gae-java-persistence.blogspot.com/2009/12/queries-with-and-
    in-filters.html

    ReplyDelete
  10. Good question. In the traditional sense, no. != and IN filters are executed in the datastore api implementation, which executes inside your app. Since we don't currently allow threads in your app the sub-queries are initiated in sequence. However, we only need one result from each sub-query to determine the next result to return for the top-level query, and query results aren't streamed back from the datastore until they are needed, so in a sense the queries are being evaluated in parallel, just not in parallel within the jvm where your application code is running.

    ReplyDelete
  11. When building a query using the low-level datastore API, am I limited to 30 underlying queries too?
    I need to count plenty of entites fast in geocells for my map based app.

    ReplyDelete
  12. Doesn't it support not in or not contains??

    ReplyDelete
  13. It is possible to not just test if the query string *contains* the Entity property, but also find out whether the query string contains the Entity property at the *beginning* of the query string?

    E.g.

    Entity Property: "this is just a test"
    Query String: "this is" - SHOULD MATCH
    Query String: "just a test" - SHOULD FAIL

    As it is, if I just use the IN operator, both query strings would match

    ReplyDelete
  14. I'm just replying to my previous. I think I totally misunderstood the function of contains()

    I thought it performed the same function as String.contains()

    But it's actually used to search if a property is contained in a collection of possible results.

    Oops.

    Is there any sort of String.contains() functionality in JDOQL queries?

    ReplyDelete
  15. This comment has been removed by the author.

    ReplyDelete
  16. The low level API permits batch gets when retrieving by entity key. They don't mention a limit of 30 anywhere for that.

    Is it possible to fetch more than 30 entities with JPA if you query with an IN clause on the key property?

    For example:
    select from MyEntity where id IN (:keyList)

    and then keyList is a collection with over 30 keys.

    ReplyDelete
  17. Hi Max,
    Nice Blog!!

    I have a couple of questions:
    1. I didn't understood this ':p1.contains(favoriteFood) && :p2.contains(age)'.
    Everything okay but what is ':p1' and ':p2'? Is it a syntax to specify 'parameter 1', 'parameter 2' ?

    2. Can we match substrings? If yes, how? Suppose datastore has values "looping", "loop", "loops" and the search parameter value is "loop". So the result should give all the 3 values. :p.contains returns only the search value(here "loop") is returned.

    Thanks

    ReplyDelete
  18. It's been nearly 4 years you started supporting != and IN *non-natively* through a kludge. Are there any plans to support != and IN natively any time soon?

    The problem is that in their current avatars, they (especially IN) result in rapid increase of costs and additionally, in limitations. Now I need to put checks in the system so that the total number of items in IN are not more than 30. Too problematic. Need a graceful solution here.

    ReplyDelete
  19. This is extremely a pleasant and instructive, containing all data and furthermore greatly affects the new innovation. A debt of gratitude is in order for sharing it. To know about top account on instagram, please go to webstagram

    ReplyDelete
  20. Thank you so much for spending time to write this blog article, i must say you are a excellent and talented blogger, I Found This Article To Be Very Informative and Helpful, please keep sharing such amazing blogs I am a Accountant from United States of America, Washington Dc, and I Love to read and Write Blog.
    Some Of My Web Blogs Awesome Blog Post Please Feel Free to Check Out My Blog you will find the process of Norton software installation with step by step instructions and much more.

    Keep Sharing will come back to read more,

    Click to Process

    norton.com/setup download and install
    norton.com/setup enter product key
    www.norton.com/setup
    login.norton.com
    my.norton.com
    norton.com/setup
    norton.com/setup activate
    norton.com/setup install
    norton.com/setup help
    norton.com/setup renewals
    norton.com/enroll product key

    ReplyDelete
  21. Thanks for provide great informatic and looking beautiful blog, really nice required information & the things i never imagined and i would request, wright more blog and blog post like that for us. Thanks you once agian
    we offer services birth certificate in delhi which inculde name add in birth certificate and birth certificate correction complete process is online and we offer birth certificate onlineand we offer this birth certificate apply online same service offers at yourdoorstep at birth certificate in ghaziabad our dream to provide birth certificate in india and other staes like birth certificate in bengaluru and birth certificate in gurgaon book service with us birth certificate in noida also, service at yoursdoorstep only.

    ReplyDelete
  22. Best method to create your primevideo mytv code account through the activation process by primevideo.com/mytv. You need to follow the instructions to activate www.amazon.com code. If you are viewing Amazon Prime Video on your device go toamazon.com/mytv.

    ReplyDelete
  23. Primevideo.com/mytv is Amazon online interface to enlist a gadget with the enactment code on the TV screen to watch Prime recordings substance on the gadget that produced the Prime video initiation code. The superb video enactment code interfaces the gadget to the real time feature and registers the gadget to transfer the help when you click on the excellent video application on the gadget.
    Read more…

    ReplyDelete
  24. Copy and paste the web link link in your web browser that is www.amazon.com/mytv enter code. Login to your prime video account using your prime video login credentials. After that enter the activation code you have on your screen .Devices Compatible with Prime Video. Exclusive content and Amazon Original series Enjoy exclusive movies and TV shows on your device.

    ReplyDelete
  25. Make sure that your OS is upgraded into the new variant. Restart the program by departing out, shutting the desktop program and re opening Tubi. ... Apparent Tubi cache and data stored in your own program. You can achieve it when you go to Settings onto your own Android apparatus, and selecting Apps > Tubi > tap Clear cache.
    tubi.tv/activate
    tubi.tv/activate enter code
    tubi.tv/activate

    ReplyDelete
  26. The Gmail application doesn't perform properly because of some reasons. The ofttimes occurring issues with the Gmail application ar associated with receiving emails and notifications. you have got to resolve these problems by following the Gmail not working troubleshooting steps below.

    ReplyDelete
  27. Music and Performing Arts today is growing to be fields that students would like to consider as professional education options. All professional courses In Music and Performing Arts are available through us. Coaching And Mentoring In Music & Performing Arts is also our forte.

    ReplyDelete
  28. This comment has been removed by the author.

    ReplyDelete
  29. This comment has been removed by the author.

    ReplyDelete
  30. The laptops are lightweight and portable, making them easy to carry to and from school. Visit

    ReplyDelete
  31. Before diving into the assistance options, it's crucial to understand why furniture matters. Furniture is not just about aesthetics; it plays a fundamental role in a family's daily life. Click here

    ReplyDelete