Google App Engine Java Persistence: Keys Only Queries

Thursday, October 15, 2009

Keys Only Queries

(This was originally posted to the Google App Java Google Group on October 6, 2009)

If you use the low-level datastore api you may have noticed that the com.google.appengine.api.datastore.Query class has a setKeysOnly() method. If you call this method before you execute the query the datastore will return com.google.appengine.api.datastore.Entity instances that have their keys filled in but none of their properties. This can reduce consumption of your Datastore Data Received from API quota, especially if you've got some large entities, but, more importantly, it can also reduce consumption of your Datastore CPU Time quota. How? Well, if the fulfillment of your query requires an index or a merge join your query actually executes in two stages: First it scans the index to find the keys of the entities that match and then it issues additional scans to retrieve the entities uniquely identified by the matching keys. If your query is keys-only we can skip that second step entirely. That means faster queries!

Now, JPA and JDO don't know anything about keys-only queries, but they do give you the flexibility to either return your entire object or some subset of the fields on your object. If you construct this subset to only contain the primary key of your object, the App Engine implementation of JPA and JDO will use a keys-only query. Let's look at some examples:

JPA:
@Entity
public class Book {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    private Date dateOfPublication;

    // getters and setters
}

Now let's implement a method that returns the Keys of all Books published betweeen 2 years (we'll assume someone else is creating and closing an EntityManager named 'em' for us):

public List keysOfBooksPublishedBetween(EntityManager em, Date from, Date to) {
    em.getTransaction().begin();
    try {
        Query q = em.createQuery("select id from " + Book.class.getName()
            + " where dateOfPublication >= :from AND dateOfPublication <= :to");
        q.setParameter("from", from);
        q.setParameter("to", to);
        return (List) q.getResultList();
    } finally {
        em.getTransaction().rollback();
    }
}

JDO:

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")
public class Book {

    @PrimaryKey
    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    private Date dateOfPublication;

    // getters and setters
}

Now let's implement a method that returns the Keys of all Books published betweeen 2 years (we'll assume someone else is creating and closing a PersistenceManager named 'pm' for us):

public List keysOfBooksPublishedBetween(PersistenceManager pm, Date from, Date to) {
    pm.currentTransaction().begin();
    try {
        Query q = pm.newQuery("select id from " + Book.class.getName()
            + " where dateOfPublication >= :from && dateOfPublication <= :to");
        return (List) q.execute(from, to);
    } finally {
        pm.currentTransaction().rollback();
    }
}
--------------------------------
Notice how we are only selecting the 'id' field from our Book class. This is crucial. If you select any other fields your query will end up fetching entire entities and the optimization will be lost.

12 comments:

UnknownOctober 30, 2009 at 7:25 AM
Why is the transaction rolled-back at the end of the each keysOfBooksPublishedBetween() function call?
ReplyDelete
Replies
MaxOctober 30, 2009 at 9:44 AM
The transaction needs to be either committed or rolled back. I'm not performing any writes inside the transaction so I'm rolling it back.
ReplyDelete
Replies
UnknownOctober 30, 2009 at 11:22 AM
Thanks! That makes sense.

Is it necessary to wrap this kind of call in a transaction?

I have a JUnit test on my devserver that works the same (i.e., gets the keys it should be getting) whether I start a transaction or not (haven't tried in appengine yet).
ReplyDelete
Replies
MaxOctober 30, 2009 at 4:06 PM
I generally wrap everything in a transaction unless I can't due to multiple entity groups. If you're issuing a query that maps to a batch get and the entities you're fetching belong to multiple entity groups you'll get an exception.

The behavior should be identical whether you're running locally or in prod. If it's not please let me know.
ReplyDelete
Replies
UnknownJanuary 26, 2010 at 1:12 PM
I do understand the sentence:
"The transaction needs to be either committed or rolled back".
I don't know much about JDO/JPA/DataNucleus, so sorry if those questions seem basic. I can see the transaction.begin, the transaction.rollback, but not the transaction.commit. Is the transaction.commit implicit on "return" of the operation ? o did you omit it for simplicity of the example ?

I do not understand the logic of this sentence:
"I'm not performing any writes inside the transaction so I'm rolling it back".

The transaction contains just a "select", so the most we did here, under the cover, is potentially to read-lock some data. Even if we did read-lock, a (implicit or explicit) transaction.commit should release the read-locks (and the world should be back like it was before the transaction started), no ?

What does transaction.rollback do for you here ? No write-lock or data updates, so what the rollback does ?

I would expect the need for and explicit "transaction.rollback" if soemthing goes wrong in a transaction where we write-lock some data (i.e. update/create/delete data), but not in a transaction that just read data.

Thanks,
ReplyDelete
Replies
MaxJanuary 28, 2010 at 9:55 AM
If you start a transaction you need to either commit it or roll it back. If you just leave it open it will eventually time out on the server side, and the cleanup it performs is equivalent to a rollback, so I'm just making it explicit. You could just as easily commit the transaction the behavior of your application would be no different.
ReplyDelete
Replies
UnknownJanuary 29, 2010 at 12:21 PM
Okay thanks. I guess for me the "commit" would have been logical rather than the "rollback", even though in this case they behave the same.

Thanks.
ReplyDelete
Replies
UnknownJuly 3, 2010 at 12:04 PM
See
http://code.google.com/p/googleappengine/issues/detail?id=3423

for issues when making keys only queries.
ReplyDelete
Replies
Sharon WoodMarch 31, 2019 at 10:15 PM
This comment has been removed by the author.
ReplyDelete
Replies
Dale MorrisApril 9, 2019 at 9:55 PM
Norton Antivirus Customer Service Number
mcafee technical support phone number
Malwarebytes tech support phone number
hp printer support mac
canon printer support telephone number
ReplyDelete
Replies
Darren DemersAugust 27, 2020 at 3:25 AM
If you use the low-level datastore api you may have noticed that the com.google.appengine.api.datastore.Query class has a setKeysOnly() method. If you call this method before you execute the query the datastore will return com.google.appengine.api.datastore.Entity instances that have their keys filled in but none of their properties. wholesale bed sheets , cheap comforter sets , gul ahmed sale 2018 bed sheets , bed sheet wedding , duvet covers king , pure cotton mattress online , blankets near me , bed razai , 6 seater sofa cover , crushed velvet bedding This can reduce consumption of your Datastore Data Received from API quota, especially if you've got some large entities, but, more importantly, it can also reduce consumption of your Datastore CPU Time quota. How? Well, if the fulfillment of your query requires an index or a merge join your query actually executes in two stages: First it scans the index to find the keys of the entities that match and then it issues additional scans to retrieve the entities uniquely identified by the matching keys. If your query is keys-only we can skip that second step entirely. That means faster queries!
ReplyDelete
Replies

Google App Engine Java Persistence

Thursday, October 15, 2009

Keys Only Queries

12 comments:

App Engine Links

Followers

Blog Archive

About Me