Friday, November 20, 2009

Case Insensitive Queries

Hi everyone.  I've been posting a bit less frequently the past few weeks because the team has really been working hard on our next release.  It's almost ready.....we're excited to get it out to you!  On to today's topic....

I see questions about case insensitive queries on our message board pretty regularly.  We usually give people the same answer, but rather than continuing to describe the solution, why not just provide it?  Here we go.

If you're using a relational database it's pretty easy to execute case-insensitive queries because it's a native feature of the database.  It usually looks something like this:

 select * from Person where UPPER(lastName) = 'ROSS'

This is a challenge for the App Engine Datastore because we rely on executing scans over a limited subset of your data and returning results as we encounter them.  Consider the following strings in lexicographic order:
...
ROSE
...
ROSS
...
ROSTA
...
Rose
...
Ross
...
Rosta
...
rose
...
ross
...
rosta
...

As you can see there could be an unlimited number of rows in between 'ROSS' and 'Ross', and also between 'Ross' and 'ross' (okay not unlimited, we do have a limit on indexed string length, but definitely Large), so if we start scanning at 'ROSS' we might have to skip a huge number of results before we hit 'Ross', and that doesn't scale.  This is why the datastore doesn't support case-insensitive queries.

Fortunately it's not hard to implement support for case-insensitive queries in your application.  Here's the approach: for each field that you want to query in a case-insensitive way, create a duplicate field that stores the value of that field in either all upper or lowercase letters.  Then, have your model object register a pre-persist callback with your persistence framework (JPA or JDO), and then populate the duplicate fields inside that callback.  Here's what it looks like.

JPA:
import javax.persistence.PrePersist;
import javax.persistence.PreUpdate;
 

@Entity
public class Person {
  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  private String lastName; 
  private String uppercaseLastName;

  public Long getId() {
    return id;
  }


  public String getLastName() {
    return lastName;
  }

  public void setLastName(String lastName) {
    this.lastName = lastName;
  }

  @PrePersist
  @PreUpdate
  public void prePersist() {
    if (lastName != null) {
      uppercaseLastName = lastName.toUpperCase();
    } else {
      uppercaseLastName = null;
    }
  } 

}
 
public List<Person> getPeopleByLastName(String lastName) {
  Query q = em.createQuery("select from Person where uppercaseLastName = :p");
  q.setParameter("p", lastName.toUpperCase());
  return (List<Person>) q.getResultList(); 

}


JDO:
import javax.jdo.listener.StoreCallback;

@PersistenceCapable(identityType = IdentityType.APPLICATION)
public class Person implements StoreCallback {
  @PrimaryKey
  @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
  private Long id;
  private String lastName;
  private String uppercaseLastName;

  public Long getId() {
    return id;
  }

  public String getLastName() {
    return lastName;
  }

  public void setLastName(String lastName) {
    this.lastName = lastName;
  }

  public void jdoPreStore() {
    if (lastName != null) {
      uppercaseLastName = lastName.toUpperCase();
    } else {
      uppercaseLastName = null;
    }
  }
}

public List<Person> getPeopleByLastName(String lastName) { 
  Query q = pm.newQuery(Person.class, "uppercaseLastName == :p"); 
  return (List<Person>) q.execute(lastName.toUpperCase()); 
}

Note that in both examples I have not created a getter or a setter for the uppercaseLastName field.  That's because this field is part of the interface we're exposing to the persistence framework but not part of the public interface of the model object.  Your query code can filter by the uppercaseLastName field but the client code that interacts with the Person object has no reason to know that there are duplicate fields that are used to support case-insensitive queries.

49 comments:

  1. thanks for that post, max.

    a simple question: wouldn't the property name then be a candidate for being defined as "unindexed" in regard to optimization?

    ReplyDelete
  2. If you're sure you'll never want to filter or sort by that propery in a case sensitive way then sure, "unindexed" sounds reasonable. Still, make sure the optimization is really necessary. It's much easier to change a property from "indexed" to "unindexed" than the other way around (see previous post).

    ReplyDelete
  3. Do you have an opinion on whether or not toUpperCase should be used vs. toLowerCase? I can't think of any reason for one over the other, but am wondering if there's something to think about if Locale's are introduced.

    For example, my entity stores a Locale in it along with a lower-case field, and based on the locale, I perform the lowercase function in my prePersist(). Not sure if it really matters for English, but perhaps for other language sets?

    ReplyDelete
  4. Interesting question, I don't know the answer. Perhaps someone else can chime in?

    ReplyDelete
  5. JavaDoc from String#toUpperCase()
    -----------------------------
    Converts all of the characters in this String to upper case using the rules of the default locale. This method is equivalent to toUpperCase(Locale.getDefault()).

    Note: This method is locale sensitive, and may produce unexpected results if used for strings that are intended to be interpreted locale independently. Examples are programming language identifiers, protocol keys, and HTML tags. For instance, "title".toUpperCase() in a Turkish locale returns "T\u0130TLE", where '\u0130' is the LATIN CAPITAL LETTER I WITH DOT ABOVE character. To obtain correct results for locale insensitive strings, use toUpperCase(Locale.ENGLISH).
    -----------------------------

    So, your solution (just as SQL one) will not work for Turkish language.

    E.g we have name "Likas". It will be stored as "LIKAS" (Locale.ENGLISH is the default one in AppEngine, right?). But in Turkish uppercase variant is "LİKAS". So, if some Turkish person will look for "LİKAS", he will not fined "Likas". Hopefully you get my point :)

    ReplyDelete
  6. So a good solution might be to capture the Locale of the user entering in a certain type of data, and then perform a lower/upper case translation with that user's locale (String.toLowerCase(Locale.XYZ))

    Not as easy as Max's solution since it requires one more piece of data (the Locale), but should work if you can capture the locale.

    ReplyDelete
  7. What is the benefit of the logic being in prePersist() instead of setLastName()

    ReplyDelete
  8. Good question! In this example I can't think of a benefit, other than my desire to demonstrate that these callbacks are easy to use. However, if you needed to compute some derived state based on the value of more than one field you'd definitely want the logic in prePersist() since you don't need to think about the order in which your setters are invoked.

    ReplyDelete
  9. Ok, now I'm stuck with 20k records that do not have this. What can I do?

    cron? It can run for just 30 seconds right. How do I go about the entire process?

    I'm subscribing to ur blog now, have so much to learn!

    ReplyDelete
  10. I ran into the problem of mass updating 15k+ object lists. My first idea was to do it on localhost and use the bulk uploader to upload the whole DB, but the bulk uploader is only available in Python...

    So what I ended up doing was creating a Task which only processes n objects in a run and then reschedule a new task to continue with the remaining objects. I chose n to be a number that is safe to be finished in 30 seconds. I also ran into the problem of seeking into the middle of the resultset, as the Query.setRange() only accepts offsets below 1000... to solve this problem, I used used a Cursor which was being passed between the tasks, so the next task could continue exactly where the previous once finished, without having to rerun the appstore query.

    See AppStore documentation for Tasks and Data storage / Cursor.

    ReplyDelete
  11. Thanks for the post. Really important;

    ReplyDelete
  12. Thank you!

    Too bad I saw this after a few wasted hours...

    ReplyDelete
  13. A woman wakes up in the middle of the night in a state of crisis. Her life flashes before her eyes as she struggles to get to the bottom of her anxiety.limo Las Vegas

    ReplyDelete
  14. Your column father fasts an extend of their dally duess on buggy vias these persons reconsider sonorous censor match census irrelevant so that you desire to and gown prestige per despite basss.legal recruitment

    ReplyDelete
  15. Getting the best price on hard wood flooring in atlanta can be difficult. Because of this I thought this article may be helpful to you and your readers.best vitamins for arthritis

    ReplyDelete
  16. Great blog! I really love how it is easy on my eyes and the information are well written. I am wondering how I might be notified whenever a new post has been made. I have subscribed to your rss feed which really should do the trick! Have a nice day!

    ReplyDelete
  17. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.e procurement

    ReplyDelete
  18. this got me thinking what other workouts are good for those of us who find ourselves on the road or have limited.Ft Lauderdale Patent Attorney

    ReplyDelete
  19. Interesting blog. This is one of my favorite blog also I want you to update more post like this. Thanks for sharing this article.send money abroad online

    ReplyDelete
  20. I would love to stop by. But, I think it might have to wait until this summer. I did not know that Serlkay had ever expanded its size. I must say that a succesful family owned business in this.buy job posting

    ReplyDelete
  21. This can be well-articulated, sharpened, and also aim. Precisely what important lessons have you found out that an individual applied to this post. Looking forward to go through your experience. Where have you came your own bottom line?unique bow ties

    ReplyDelete
  22. You have a very inspiring way of exploring and sharing your thoughts. restaurant hood cleaning

    ReplyDelete
  23. My partner and i mood tutorial dapper your blog, I want to in order to necessitate some sort of troll elocution in order to tenor an individual although would like which you serviceable extension. Looking an individual the actual search associated with attraction pertaining to numerous your writing a blog surenesss.perfumes cacharel

    ReplyDelete
  24. I am having a great time today! Extremely helpful information specially the last part I care for such info a lot.I am just amazed to see this top notch article today.compare conveyancing

    ReplyDelete
  25. I don't have sufficient energy right now to completely read your site but I have bookmarked it and likewise include your RSS channels. I will be back in a day or two. much obliged concerning an incredible site.cheap money transfer

    ReplyDelete
  26. Thanks for sharing. I hope it will be helpful for too many people that are searching for this topic.70 volt speakers

    ReplyDelete
  27. ast our site definitely will indisputably turn out to be widely known with virtually all operating a blog families, due to its fastidious content pieces and / or review articles.janitorial cleaning services etobicoke

    ReplyDelete
  28. A case study evaluation needs one to address the company’s issue, analyze the alternate options, and propose the best answer using supporting evidence.oven cleaning

    ReplyDelete
  29. Thanks for this great post, i find it very interesting and very well thought out and put together. I look forward to reading your work in the future.room scheduling software for colleges

    ReplyDelete
  30. Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.paris escort

    ReplyDelete
  31. This diet is an intensive program that allows you to lose weight fast. The speed of this diet means you can do it while you are on a 3-week holiday.room signage software

    ReplyDelete
  32. Nice post. I leaгn something totally new and challenging on websites I stumbleupon on a daily bɑsis.personalization t shirts

    ReplyDelete
  33. This is really awesome and i love that.. This is very unique thing you put on that post.mtn data plans

    ReplyDelete
  34. It seems interesting. Its like you read my mind! You appear to know so much about this, like you wrote the book in it or something. I think that you could do with some pics to drive the message home a little bit, but other than that, this is great blog. A great read. I will certainly be back.mississauga mortgage agent

    ReplyDelete
  35. Everything is very open with a very clear clarification of the challenges. It was definitely informative. Your website is very useful. mortgage broker barrie

    ReplyDelete
  36. First of all Good day! I would just like to give thanks to the admin who posted this one I really appreciate it thank you.family dentist in Vaughan

    ReplyDelete
  37. Such a wonderful work has been done by you really your work is likeable to praise it is an awesome post. healthcare email database

    ReplyDelete
  38. It is a really good submit and incredibly helpful within the fascinating line of articles.Bound Gravel

    ReplyDelete
  39. When My partner and i go through some sort of blog site, I'm hoping who's isn't going to let down everyone approximately this blog. What I'm saying is,Commercial snow cone machines

    ReplyDelete
  40. Thank you so much for the post you do. I like your post and all you share with us is up to date and quite informative, i would like to bookmark the page so i can come here again to read you, as you have done a wonderful job.long island tree services

    ReplyDelete
  41. Thanks for sharing this information with us, i always like to read these articles, cool..!tree trimming Bronx County

    ReplyDelete
  42. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles.Resin Drives Diy Kit

    ReplyDelete
  43. Great post full of useful tips! My site is fairly new and I am also having a hard time getting my readers to leave comments. Analytics shows they are coming to the site but I have a feeling “nobody wants to be first”.Murcia

    ReplyDelete
  44. I really like the dear information you offer in your articles. I’m able to bookmark your site and show the kids check out up here generally. Im fairly positive theyre likely to be informed a great deal of new stuff here than anyone.ant extermination

    ReplyDelete