Schemaless Data Structures

In recent years, there's been an increasing amount of talk about the advantages of schemaless data. Being schemaless is one of the main reasons for interest in NoSQL databases. But there are many subtleties involved in schemalessness, both with respect to databases and in-memory data structures. These subtleties are present both in the meaning of schemaless and in the advantages and disadvantages of using a schemaless approach.

7 January 2013

This page is a fallback page for the proper infodeck.

There are couple of reasons why you are seeing this page

The following is dump of the text in the deck to help search engines perform indexing

Be wary of schemaless data structures, since they still have an implicit schema and, with a couple of exceptions, an explicit schema is better.
Schemaless Data Structures

In recent years, there's been an increasing amount of talk about the advantages of schemaless data. Being schemaless is one of the main reasons for interest in NoSQL databases. But there are many subtleties involved in schemalessness, both with respect to databases and in-memory data structures. These subtleties are present both in the meaning of schemaless and in the advantages and disadvantages of using a schemaless approach.

Martin Fowler

2013-01-07

Hints for using this deck

For articles on similar topics, take a look at these tags:


Our agenda
To understand “schemaless”, begin with a relational schema
A schemaless database allows you to store any data
but schemaless structures still have an implicit schema
The concept of “schema” also applies in-memory

A class definition defines the logical fields you can use to manipulate it. This is effectively a schema.

The same is true of any record structure (typed or not)

A Dictionary (aka Hash | Associative Array | Map) is a common way to make a schemaless data structure in memory. The notion of an implied schema still applies, there is little difference between aCustomer.firstname and aCustomer['firstname']

Kent Beck's essential books on object-oriented programming describe this difference as the difference between Common State and Variable State.


One object can combine a schema and schemaless access
Schemaless extensions are common even with relational systems
customField_1_namecustomField_1_valuecustomField_2_name
zip02201
firstnamelastnamecustomData
MartinFowler{'middle_initial': 'X', 'zip': '02201'}
Customers
idfirstname
1234Martin
CustomAttributes
tablekeyfieldNamefieldValue
customers1234zip02201

The examples so far are of Storage Schemas Another form of schema is a Predicate Schema
XML schemas are a familiar example of a predicate schema

XML File

Schema (in Relax NG compact syntax)


Schemas are a mechanism for documenting a contract
So what are the factors we should consider between using a schema and going without?
Implicit Schemas are hidden So use a schema if you can

Custom fields work best with a schemaless approach
Non-uniform types are tricky for schemas Schemaless stores offer a pragmatic alternative
No schema to update Schemaless migration is easier Schemaless migration still requires care