Parallel Model

Allow an alternative representation of the state of an application, either at a different time or in a hypothetical state.

Martin Fowler

12 December 2005

This is part of the Further Enterprise Application Architecture development writing that I was doing in the mid 2000’s. Sadly too many other things have claimed my attention since, so I haven’t had time to work on them further, nor do I see much time in the foreseeable future. As such this material is very much in draft form and I won’t be doing any corrections or updates until I’m able to find time to work on it again.

Enterprise Applications are information systems which capture data about the state of the world. In many cases as we use information systems we want them to reflect our current best understanding of that world. At times we also may want to know about the world in the past or explore the consequences of what might happen in the future.

Most of the time we handle these alternative states of the world by building into our current state of the world facilities to capture data about these alternative states. A Temporal Property of past addresses for a customer is a good example of this.

However sometimes we want something more encompassing, something that really captures all the nuances of the world in a past or imagined state. A Parallel Model does this by allowing us to easily take our entire information store and represent it as it was in the past or as it might be in some alternative past, present, or future.

How it Works

When we talk about an information system that is able to query and manipulate any past state - or any hypothetical past, present or future states - it sounds distinctly like science fiction. Yet any half-way serious software development project does this all the time when we use version control systems. We can re-create any past state of the code, we can create alternative realities (through branches) and indeed we can hold on to multiple alternative realities at once (such as multiple active code-lines.)

At the heart of version control systems ability to handle this mega-schizoid thinking is the fact that they use Event Sourcing. Every change to code is captured and stored. If on Friday you ask for the state of the code the previous Wednesday the version control system conceptually starts with nothing and applies every code change event that occurred until Wednesday and gives you the result. I say 'conceptually' because it probably doesn't do exactly that, but it does the observably equivalent thing to that.

The way to think about this is that the current state of an application is the result of taking a blank initial state and playing events to it. Every sequence of events leads to a different state. The state on Wednesday morning is the result of applying all the events that occurred before Wednesday, the state now is the result of playing all the events that ever occurred. (If you're reading this on a Wednesday morning you'll have to put the book down for a few hours so that the last sentence makes sense.)

Thinking of a state as the result of applying events sets up the approach of handling hypothetical realities. You want to know what would have happened had Derek not made that idiotic checking last Wednesday, you can do this (to some extent) by just playing every event other than that checking. This gives you an alternate reality (or maybe the real one, we get dangerously close to THHGTTG when we think about these things.)

We think quite normally of applying Parallel Model to our code, we can also apply it to the things we write with our code. If we use Event Sourcing on an application that manages supply chains we explore the consequences of a mad Christmas rush on non-stick glue by creating a whole bunch of purchasing events and introducing them to a safe copy of the system and seeing what happens. We can also see where that shipment of silent records was last week by creating a Parallel Model for last week.

To build an Parallel Model the first thing to check is that you have a system that uses Event Sourcing. If the system was designed with Event Sourcing in mind, that's the best. Failing that there may be some way to reconstruct events from logs.

The next thing you'll need is a way to process events in a model that's separated from the usual reality. There are broadly two ways to do this. One is to construct a Parallel System which is a separate copy of the application with its own database and environment. The other is to allow a single system to be able to switch between embedded parallel models.

Parallel System

The great advantage of a parallel system is that you don't need to do anything to the application itself to construct a parallel model. You take a copy of the application, switch out its use of resources, such as a persistant data store, to use a copy. Then you feed it the events you need to feed it with and run it.

A problem with a parallel system is what you do with the results. Since the application is ignorant of the parallelism, it can't manipulate or display results from multiple parallel models. To tie the results of parallel model processing into a broader system you'll need to use integration techniques, effectively treating the parallel system as an entirely separate application.

One of the benefits of using a parallel system is that you also have the ability to change the source code of the parallel system. With parallel models that are embedded into an application, any behavioral changes are limited to behavior that's built into the model that's being parallelized. While this can be substantial if you're using an adaptive object model, there are limits. With a parallel system you can make any change you like to the source code of the application and see the consequences of this on the event stream.

Embedded Parallel Models

If you want to embed multiple Parallel Model into an application you'll need to ensure any peristant storage can easily be switched between multiple actual data sources. A good way to do this is to use a Repository and provide a mechanism for the application to be able to switch Repositorys at runtime. Temporary Repositorys needn't be persistent - since Parallel Model tend to be temporary they can often be assembled in memory. Once you've processed your events and got the information you need from the Parallel Model you can discard the repository.

Switching the system to use an temporary repository means you have to first create this repository, initialize it (typically with the schema and any immutable reference data), and then replacing the live repository with the temporary one in any places that the regular application code sees it. This is where using a single reference point for the repository is so important.

One advantage of using repositories like this is that it can be much faster since you are processing everything in-memory.

Whichever way you do your parallelism there is always an effect where the models that are parallel are unaware of the exisitence of alternate realities, but there is an outer system that is aware of the paralel worlds. With embedded models the applications itself is divided into the parallel aware and unware segments, with parallel systems some systems are unaware and some other systems can be aware. It's also quite possible that you can multiple levels of parallelism and merging. You may need a very strong cup of tea to deal with them.

Processing the Events

To build your parallel model you'll need to decide which events need to be processed which nds on the nature of the Parallel Model. If it's a historic state, then you need all the events until that point. This can just be done by querying the main system event log. The nice thing about event logs is that they are a sequence of immutable objects, so taking copies is easy (although it can be slow).

If it's a hypothetical state then you'll need a way to define and inject the alternative events into the event stream that you want to process.

Then you process your selected events in exactly the same way that you process the events for the live system. If you've arranged things properly there should be no changes made to any of the system at this point - you just run the regular stuff. At the end you have the application state and you can query it again using regular mechanisms.

One of the tangles of using a Parallel Model is that it's easy to get into an identity crisis. Usually when you have an model you have a clear correspondence between the entities in your model and both th persistent store and the real world. The Gregor object in your model corresponds to the one in your reality and you only have one of them. When you have a Parallel Model then you can get into the situation where you have multiple Gregors, one for each Parallel Model: scary but also confusing. As a result you have to be very careful about what objects you hold onto as you juggle your Parallel Model, particularly if you have embedded Parallel Model. The least confusing thing to do is to ditch any objects when you switch repositories. Another alternative is to have every entity keep a reference to the repository it came from. Although parallel systems avoid this problem within a single system, you may run into this if you bring the results of parallel runs together.

Optimizations

From the conceptual outline of creating a Parallel Model you can see that creating them is quite computationally expensive. Processing every event since the beginning of time is likely to take a while, particularly if there lots of events. Version control systems have had to deal with this for a while and have come up with a few ways to reduce this burden.

One is that you don't really have to start with the beginning of time. You can take snapshots of the application state at various points and use the latest plausible snapshot as the start point for your Parallel Model. The latest plausible snapshot is the last snapshot before you do something different in the event stream. For a historical Parallel Model this will be the latest snapshot before the date of the Parallel Model. For a hypothetical Parallel Model it will be the last snapshot before the first variant event.

So let's take some examples. I'm at November 17, I want a Parallel Model for September 12 and I take snapshots at the beginning of every month. I can start with the September 1 snapshot and process every event that occurred from September 1 until September 12.

I'm at November 17 and I want to explore a sales rush next week. In this case I can make a snapshot from my current application state and add in my hypothetical events.

I'm at November 17 and want to explore what would have happened if I'd had sharply reduced sales in October. I start with the October 1 snapshot, and process my variant events.

Another optimization is to work backwards as well as forwards. In order for this to work I need my events to be reversible. But with this I could create a historic Parallel Model for September 27th by starting with the October 1 snapshot and reversing the events from Oct 1 back to Sep 27th. Usually this will be faster as there are less events.

When you start thinking of processing less events, it's natural to think of a selective replay. If you want to look at the past state of my order, you may be able to ignore events that act on everyone else's order, providing that that those events have no effect on my order. Using selective replay can dramatically reduce how many events you have to process, but the difficult thing is being sure that there aren't subtle interactions between events. Not processing a fulfillment for another order could mean that the system erroneously think there's room on a transport which completely messes up the history of my order. The more complex the system, the harder it is to spot these complex interactions. You can use selective replay with both forward and reverse processing, the advantages and risks are the same in each case.

Its a good idea to be aware of what the most common queries are. The Subversion version control system is aware that most requests are for the latest version of the code, so it stores that as a snapshot and uses reverse events from there to determine a past state (it also uses other snapshots from time to time).

The nice thing about all these optimizations is that they should be entirely hidden from the user of system. You can always think of creating Parallel Model by going forwards from the beginning of time. You can also start with that simple implementation and add snapshots later. (Reversible events are a bit more tricky to add later and are likely to affect the model.)

This provides an approach to testing where you randomly generate event sequences and construct Parallel Model using an unoptimized way and then with various optimized ways. Any differences in the application state of the Parallel Model indicate a fault which you can investigate further.

External Queries

One of the problems with Event Sourcing is that we have to remember the results of external queries. For Parallel Model you get an additional problem when considering hypothetical scenarios. If we are processing an event that didn't happen, we won't have a record of the result of external queries for that event.

In this situation we need to modify the gateways so that they can return data that we consider reasonable as the response of the external system. If we program the gateways with hypothetical remembered queries we can capture these as part of the setting up of the hypothetical scenario.

When to Use It

Parallel Model is one way of handling historic and alternative states. Another alternative is to embed this information in the model itself using such patterns as Temporal Property, Temporal Object and Effectivity.

One great advantage of using Parallel Model is that it removes the complexity of these patterns from the model. You can concentrate on making the model a simple snapshot model and completely ignore these multiple times and perspectives. Another strength is that putting these constructs in the model everywhere is hard work and every piece adds complexity. So you have to choose where you place them. If you neglect to put them in somewhere, you're stuck. Either you have to add them in later on, or maybe you won't get the chance because the data is lost forever. With Parallel Model you get temporal behavior everywhere. You pay just the cost of using Parallel Model once and the whole model gets the benefits.

With such strengths come negatives. The first is the pre-requisite need for Event Sourcing, which puts its own constraints and complexities on your model. The second issue is processing time. Every query for a Parallel Model requires event processing. You can only do so much with snapshots and other optimizations.

As with most patterns it isn't a completely either/or case in that it's quite possible to use Parallel Model as your general mechanism but use Temporal Property in some places for certain common requests. You also don't need to have Parallel Model from the beginning, although you do need to have Event Sourcing. However if you build a system using Event Sourcing you can add Parallel Model easily at a later time in those places where it's useful.

Example: Shipping Temporal Query (C#)

I've based this example on the simple example of shipping, cargoes and ports that I developed for Event Sourcing. As a result you should familiarize yourself with that example before digging too much into this one. The domain logic code is pretty much the same.

Figure 1: The domain objects for our shipping example.

Since much of the complexity of Parallel Model is about juggling temporary Parallel Model and an active database, I've introduced a database into this example. I'm using the popular NHibernate object-relational mapper to map to the database. I won't go over the details of the mapping, which isn't really very interesting. Instead I'll concentrate on how it's wrapped by a Repository and how swap between it and a repository for the Parallel Model.

As with the base example, all of the changes to the domain model are handled by events. When a ship arrives in a port this is recorded by an arrival event.

class ArrivalEvent...

  string _port;
  int _ship;  
  internal ArrivalEvent (DateTime occurred, string port, int ship) : base (occurred) {
    this._port = port;
    this._ship = ship;
  } 
  internal Port Port {get {return Port.Find(_port);}}
  internal Ship Ship {get {return Ship.Find(_ship);}}
  
  internal override void Process() {
    Ship.HandleArrival(this);
  }

A difference to the base example is that in this case the port and ship are indicated by simple values as identifiers - in this case corresponding to the primary keys in the database, although any key will do. The properties, however, utilize the actual domain objects. To ensure we always get the right domain object we pull them from the database using a finder method defined on the domain object class.

class Port...

  public static Port Find(string s) {
    return (Port) Registry.DB.Find(typeof (Port), s);
  }

The finder method in turn, delegates to a Repository. In our base case, this repository is a database with domain objects mapped by NHibernate. So the repository code uses the NHibernate API to pull the object from the database (or NHibernate's cache).

class DB...

  public object Find(Type type, object key) {
    return _hibernateSession.Load(type, key);
  }

NHibernate, like most data sources that are based on Data Mapper, uses a Unit of Work to keep track of the objects it manipulates. As a result you never tell an object to save itself to the database, instead you commit the Unit of Work and it figures out which objects have changed in memory and how to write them out.

For this example I'll do this transaction wrapping in the event processor, whose processor now looks like this.

class EventProcessor...

  public void Process(DomainEvent e) {
    Registry.DB.BeginTransaction();
    try {
      e.Process();
      InsertToLog(e);
      Registry.DB.Commit();
    } catch (Exception ex) {
      Registry.DB.Rollback();
      LogFailedEvent(e, ex);
    }
  }

This is by no means always the best place to wrap your Unit of Work, that will depend on the nature of your application. But this works sufficiently for my example, and also presents the issue - how do we easily switch out the persistent repository to handle temporary queries?

Let's look at this through a test case:

class Tester...

  [Test] 
  public void TemporalQueryForShipsLocation() {
    eProc.Process(new ArrivalEvent(new DateTime(2005,11,2), la, kr));
    eProc.Process(new DepartureEvent(new DateTime(2005,11,5), la, kr ));
    eProc.Process(new ArrivalEvent(new DateTime(2005,11,6), sfo, kr));
    Assert.AreEqual(sfo, Ship.Find(kr).Port.Code);
    eProc.SetToEnd(new DateTime(2005,11,2));
    Assert.AreEqual(la, Ship.Find(kr).Port.Code);
  }

The crucial method here is SetToEnd, which alters our repository to use an in-memory repository and reprocesses the event log so that events play up to the last event on that day. This creates our Parallel Model for November 2nd.

class EventProcessor...

  IList log;
  public void SetToEnd(DateTime date) {
    SynchronizeLog();
    IRepository temporalRepository = new MemoryDB();
    Registry.enableAlternateRepository(temporalRepository);
    foreach (DomainEvent e in log) {
      if (e.Occurred > date) return;
      e.Process();
    }
  }

To run the temporal query the processor detaches itself entirely from the database. In this case the processor keeps its own copy of the event log. Before going off the database it synchronizes its log with the persistent log so that it has a full record of all the events written to the database.

Once the log is synchronized we then create a new repository that is entirely in-memory. This could be backed by an embedded in-memory database which would allow you to continue to use SQL. It can also be something handwritten that just satisfies the repository interface. Since this is a nice simple example I just used a bunch of hash tables.

class MemoryDB...

  public object Find(Type type, object key) {
    object result = this[type][key];
    if (null == result) 
      throw new ApplicationException ("unable to find: " + key.ToString());
    return result;
  }
  private IDictionary store = new Hashtable();
  private IDictionary this[Type index] {get {return (IDictionary)store[index];}}

The in-memory repository needs to be initialized to the same initial state as the database was in before processing any events - in this case holding the reference data of ports and ships.

class MemoryDB...

  public void Initialize() {
    store[typeof(Ship)] = new Hashtable();
    store[typeof(Cargo)] = new Hashtable();
    store[typeof(Port)] = new Hashtable();
    Insert(new Port("SFO", "San Francisco", "US"));
    Insert(new Port("LAX", "Los Angeles", "US"));
    Insert(new Port("YVR", "Vancouver", "CA"));
    Insert (new Port("XXX", "out to sea", "XX"));
    Insert(new Ship("King Roy", 1));
  }

With the temporary repository setup we then tell the Registry to start using it instead of the real one.

class Registry...

  internal static void enableAlternateRepository(IRepository arg) {
    instance._repository = arg;
    arg.Initialize();
  }

Now any calls to the registry in the domain model will use the in memory registry. Since this design places NHibernate inside the regular database repository this means that we don't use hibernate at all for the Parallel Model. Objects appear in memory and are held there by the repository.

Once the in-memory repository is set up we then process all the events that occurred on or before our target date, in order, from the log. Once done we have the in-memory repository representing the state of the world as at the end of the requested day. Any queries we now do on the domain objects will reflect that date.

To go back to the proper database we just swap out our temporary in-memory repository for the regular repository connection.

class Registry...

  internal static void restoreBaseRepository() {
    instance._repository = instance._baseRepository;
  }

Example: Comparing a with the base model (C#)

For many uses of Parallel Model we only work with one model at a time. General processing, including updates, are done with the base model and we build Parallel Model for historic and hypothetical queries. At any time we only have one Parallel Model in play. This scheme is relatively simple and avoids the questions of identity across parallel universes that beset so many science-fiction plots.

There are times, however, when mixing the two makes sense. If you remember the lame humor of the shipping example, you'll know about the mortal fear book distributors have of Canada and the risk of polluting text with 'eh'. Let's imagine that from time to time we get particularly nasty contagions of 'eh'. Any cargo in a ship in a port with such a contagion should be marked as high risk. Of course we don't find these things out on the day, so we have to figure out what was in the port on that day.

Here's a test case to express this problem.

class Tester...

  [Test]
  public void HighRiskDayFlagsAllCargo() {
    eProc.Process(new RegisterCargoEvent(new DateTime(2005,1,1), "UML Distilled", "UML", "LAX" ));
    eProc.Process(new RegisterCargoEvent(new DateTime(2005,1,1), "Planning XP", "PXP", "LAX" ));
    eProc.Process(new RegisterCargoEvent(new DateTime(2005,1,1), "Analysis Patterns", "AP", "LAX" ));
    eProc.Process(new RegisterCargoEvent(new DateTime(2005,1,1), "P of EAA", "eaa", "LAX" ));
    eProc.Process(new ArrivalEvent(new DateTime(2005,11,2), la, kr));
    eProc.Process(new LoadEvent(new DateTime(2005,5,11),"PXP", 1));
    eProc.Process(new LoadEvent(new DateTime(2005,5,11),"AP", 1));
    eProc.Process(new ArrivalEvent(new DateTime(2005,11,9), yvr, kr));
    eProc.Process(new ArrivalEvent(new DateTime(2005,11,12), la, kr));
    eProc.Process(new ContagionEvent(new DateTime(2005,11,10), yvr));
    Assert.IsTrue(Cargo.Find("PXP").IsHighRisk, "PXP should be high risk");
    Assert.IsTrue(Cargo.Find("AP").IsHighRisk, "AP should be high risk");
    Assert.IsFalse(Cargo.Find("UML").IsHighRisk, "UML should NOT be high risk");
    Assert.IsFalse(Cargo.Find("eaa").IsHighRisk, "UML should NOT be high risk");
    Assert.IsFalse(Cargo.Find(refact).IsHighRisk, "UML should NOT be high risk");
  }

I'm describing the contagion with a new event.

class ContagionEvent...

  internal class ContagionEvent : DomainEvent
  {
    string _portID;
    public ContagionEvent(DateTime occurred, string port) : base(occurred) {
      this._portID = port;
    }
    Port Port {get {return Port.Find(_portID);}}

internal override void Process() {
  Registry.EventProcessor.SetToEnd(Occurred);
  ArrayList infectedCargos = new ArrayList();
  foreach (Ship s in Port.Ships) infectedCargos.AddRange(s.Cargos);
  Registry.restoreBaseRepository();
  foreach (Cargo c in infectedCargos) {
    Cargo actualCargo = Cargo.Find(c.RegistrationCode);
    actualCargo.IsHighRisk = true;
  }
}

You'll notice that in this case I've actually got quite a bit of behavior in the event's process method. The reason for this is that I decided I wanted the domain model to be ignorant of Parallel Model. Thus the event has to create the temporary Parallel Model for the date of the contagion, run a query to find out which cargoes are affected, restore the world to the base model, and pass on the updates. The domain model still does the logic within each Parallel Model, although there's not much of it.

Another way of doing this, which I would consider seriously, is to make new events for marking cargoes as high risk. In this case the contagion event would find the affected cargoes in the temporary Parallel Model, and then create an event to mark these cargoes as high risk. This second event would be run in the base state. As I write this I'll confess to not being sure which of these approaches I'd prefer.