User Defined Field

23 July 2013

A common feature in software systems is to allow users to define their own fields in data structures. Consider an address book - there's a host of things that you might want to add. With new social networks popping up every day, users might want to add a new field for a Bunglr id to their contacts.

For in-memory purposes, often the best way to do this is to allow classes to include a hashmap field for user-defined fields (a pattern Kent Beck calls Variable State).

# ruby
class Contact
  attr_accessor :firstname, :lastname

  def initialize
    @data = {}
  end

  def [] arg
    return @data[arg]
  end
  def []= key, value
    @data[key] = value
  end
end

aCustomer = Contact.new
aCustomer.firstname = "Martin"
aCustomer[:bunglrId] = 'fowl'

With a setup like this you can add affordances to your UI to allow users to attach new fields to objects. If you want common user defined fields, you can use a class variable to keep a list of common keys for the hashmap. There is some awkwardness in that regular fields are accessed differently to user-defined fields, but depending on your language even this can be overcome. If your language supports Dynamic Reception then you use this to access the hashmap with regular field access.

class Contact...

  def method_missing(meth, *args)
    if @data.has_key? meth
      return @data[meth]
    else
      super
    end
  end

Often the trickiest part of this is figuring out how to persist this. If you're using a schemaless database, then it's usually straightforward - you just add the user-defined keys to your application defined ones. The trickiness comes from a database with a storage schema, particularly a relational database.

Usually the best option is to use a Serialized LOB, essentially creating a large text column into which you store the user-defined fields as a JSON or XML document. Many databases these days offer pretty nice support for this approach, including support for indexing and querying based on the data structure within the LOB. However such support, if available, is usually more awkward than using fields. 1

1: Bret Taylor describes a scheme for indexing fields in a such a scheme by building separate index tables for each indexable field.

Another route is using some kind of attribute table. A table might look something like this.

CREATE TABLE ContactAttributes (
  contactId   INTEGER, 
  attribute   TEXT, 
  value       TEXT, 
  PRIMARY KEY (contactId, attribute))

Again, querying and indexing are awkward. Queries can involve a good bit of extra joins that can get rather messy.

Pre-defined custom fields offer another system. Here you set the schema up with fields like custom_field_1 (and perhaps custom_field_1_name. You are limited to only the number of custom fields per instance that you have pre-defined. As usual indexing and querying are awkward.

When using a attribute table or pre-defined custom fields you may choose to have different columns for different SQL data types. So pre-defined fields might be integer_1, integer_2, text_1…, or a attribute table might have multiple value fields (text_value, integer_value).

A dynamic schema is an approach that's often overlooked. To do this you set things up so that when someone adds a field, you use an alter table statement to add that field to the table. Our Mingle team does this and have been happy with how it's worked out. 2 The new fields can be indexed and queried just the same as application-defined fields. This does mean all instances get all fields, so is less handy if you get a lot of variance between instances.

2: Mingle's approach is actually a bit more involved than just adding fields to an existing table. Mingle's central record type is a card (which represents stories, tasks etc). The fields on a card vary by project and you can have many projects in the same database. So rather than use a single card table, mingle creates a new table for each project's card. It then adds fields dynamically to this table as users desire.

Your persistance scheme choices will be affected by what you use for relational mapping. User-defined fields aren't the most well-trod parts of the relational mapping problem, so there's a lot of variation in support from different relational mapping libraries.

User-defined fields are a similar problem to non-uniform types 3. Both problems lead to the need for a more flexible schema, or indeed a truly schemaless approach (although remember that schemaless doesn't mean you don't have a schema). If you have non-uniform types that aren't changing at the users' behest, then one of the inheritance oriented patterns may make sense. (Single Table Inheritance, Class Table Inheritance, or Concrete Table Inheritance.)

3: Non-uniform types are types where instances use a small and very different selection of fields. Sometimes these are called sparse tables, because if you look at the whole table each row only uses a small number of a large list of columns. The difference between non-uniform types and user-defined fields is that non-uniform types have all the potential fields known to developers, while user-defined fields allow fields to be created that developers will never know about.

Notes

1: Bret Taylor describes a scheme for indexing fields in a such a scheme by building separate index tables for each indexable field.

2: Mingle's approach is actually a bit more involved than just adding fields to an existing table. Mingle's central record type is a card (which represents stories, tasks etc). The fields on a card vary by project and you can have many projects in the same database. So rather than use a single card table, mingle creates a new table for each project's card. It then adds fields dynamically to this table as users desire.

3: Non-uniform types are types where instances use a small and very different selection of fields. Sometimes these are called sparse tables, because if you look at the whole table each row only uses a small number of a large list of columns. The difference between non-uniform types and user-defined fields is that non-uniform types have all the potential fields known to developers, while user-defined fields allow fields to be created that developers will never know about.