Thursday, October 15, 2009

Serialized Fields

(This was originally posted to the Google App Java Google Group on October 14, 2009)

Suppose one of your model objects has a field that isn't a native datastore type like String or Date and isn't another model object, like the child of an owned OneToOne relationship.  Oftentimes you can use an Embedded class to store that field's sub-fields in the same record as the containing record, but what if an Embedded class isn't sufficient?  This is where Serialized Fields come in.  With JDO and JPA it's possible to store any class that implements java.io.Serializable in a single property.

For our example lets model Person, ContactProfiles, ContactProfile, and PhoneNumber.  A Person has exactly one ContactProfiles instance, a ContactProfiles has some number of ContactProfile instances, and a ContactProfile has some number of PhoneNumber instances.  Our application is such that when we retrieve a Person from the datastore it always makes sense to load their ContactProfiles as well.  We'll define ContactProfiles, ContactProfile, and PhoneNumber first since they're the same whether we're using JPA or JDO:

public static class ContactProfiles implements Serializable {
    private final
List<ContactProfile> profiles;

    public ContactProfiles(List<ContactProfile> profiles) {
      this.profiles = profiles;
    }

    public
List<ContactProfile> get() {
      return profiles;
    }
}

public class ContactProfile implements Serializable {
    private String profileName;
    private
List<PhoneNumber> phoneNumbers;

    // getters and setters
}

public class PhoneNumber implements Serializable {
    private String type;
    private String number;

    // getters and setters
}


The JPA Person definition:
@Entity
public class Person {
    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    private Key id;

    @Lob
    private ContactProfiles contactProfiles;

    // getters and setters
}


The JDO Person definition:

@PersistenceCapable(identityType = IdentityType.APPLICATION, detachable = "true")
public class Person { 
    @PrimaryKey

    @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
    private Key id;

    @Persistent(serialized = "true")
    private ContactProfiles contactProfiles;

    // getters and setters
}


In this example the App Engine JDO/JPA implementation converts the Person.contactProfiles field into a com.google.appengine.api.datastore.Blob using standard Java serialization.  This Blob is then stored as a single property on the Person Entity.  When you fetch a Person the corresponding Entity is retrieved and the Blob is deserialized in just a single datastore rpc, leaving you with a ContactProfiles instance and all its associated sub-objects.

There are a few drawbacks to this approach that you should be aware of.  First, even though your application understands the structure of the bytes that make up the ContactProfiles, the datastore does not.  As a result you can't filter or sort on any of the properties of ContactProfile or PhoneNumber.  Second, all the known gotchas associated with Java serialization apply here.  If you evolve your Serializable classes make sure you do it in a backwards compatible way!  Third, you might be looking at the definition of the ContactProfiles class and thinking "Why are we bothering with a ContactProfiles class when Person could just store List<ContactProfile>?"  The reason is that updating a serialized field is not as simple as updating other types of fields, and this extra layer of indirection makes things a bit easier.  DataNucleus has no way of "seeing" the changes you make to the internal values of a serialized field, and as a result it doesn't know when it needs to flush changes to these fields to the datastore.  However, DataNucleus can see when the top-level serialized field reference changes.  Giving Person a reference to a ContactProfiles object makes it easy for us to change ContactProfiles in a way that DataNucleus is guaranteed to recognize:

public void addContactProfile(Person p, ContactProfile cp) {
    // update our serialized field
    p.getContactProfiles().get().add(cp);
    // give the person a new ContactProfiles reference so the update is detected
    p.setContactProfiles(new ContactProfiles(p.getContactProfiles().get()));
}

The last line where we set a new ContactProfiles instace back on the Person is the key and it's easy to miss, so be careful!  If you forget this step your updates will be ignored and you will be sad.

Despite the limitations I've listed, Serialized fields are a good way to store structured data inside the record of a containing Entity.  As long as you can get by without filtering or sorting on this data and you remember that it has special update requirements, serialized fields will almost certainly come in handy at some point.

6 comments:

  1. In ContactProfiles, some of the fields and methods are missing their type parameters because the < and > characters aren't HTML-escaped.

    ReplyDelete
  2. I encountered a situation recently w/ JDO where simply creating a new top-level reference for a serialized field did not do the trick-- the blob was not getting updated. I confirmed via the debugger that I was definitely setting the serialized field to a new object*.
    (As a control, if I first set the serialized field to null, persisted, and then set it to the new object, all was well... so the new object definitely did have the correct content.)

    I could not figure out what was up, so I am simply calling JDOHelper.makeDirty() on the serialized field after setting it. This seems to work well. I'm wondering if there is any reason not to do this in general with serialized fields instead of generating a new reference.

    *Even though the parent object was new, the copied-over child fields did retain their THEIR references from the old object, though the *contents* of some of those sub-objects had changed... I don't know if that was related, and whether a deep copy of the entire object tree to be serialized would have fixed the issue.

    ReplyDelete
  3. That's definitely odd. If you care to post the code the exhibited the unexpected behavior I can take a look.

    Thanks,
    Max

    ReplyDelete
  4. Max,

    Thanks for putting this together. It was super helpful in resolving an issue I was running into.

    I would still like to know why I can't just use a ArrayList Object = new ArrayList in the Persistent class definition.....

    Saqib

    ReplyDelete
  5. Hi max,

    You said, "If you evolve your Serializable classes make sure you do it in a backwards compatible way! "
    Could you explain more about that?
    In your example, if we add new field in 'PhoneNumber', how about the existed 'Person' entity? What should I do to make it backward compatible?

    ReplyDelete