database · application architecture


Most EnterpriseApplications store persistent data with a database. This database supports operational updates of the application's state, and also various reports used for decision support and analysis. The operational needs and the reporting needs are, however, often quite different - with different requirements from a schema and different data access patterns. When this happens it's often a wise idea to separate the reporting needs into a reporting database, which takes a copy of the essential operational data but represents it in a different schema.

Such a reporting database is a completely different database to the operational database. It may be a completely different database product, using PolyglotPersistence. It should be designed around the reporting needs.

A reporting database has a number of advantages:

The downside to a reporting database is that its data has to be kept up to date. The easiest case is when you do something like use an overnight run to populate the reporting database. This often works quite well since many reporting needs work perfectly well with yesterday's data. If you need more timely data you can use a messaging system so that any changes to the operational database are forwarded to the reporting database. This is more complicated, but the data can be kept fresher. Often most reports can use slightly stale data and you can produce special case reports for things that really need to have this second's data [1].

A variation on this is to use views. This encapsulates the operational data and allows you to denormalize. In order to to separate the operational load from the reporting load you need to replicate the views to other nodes for reading. The main limitation is less flexibility to derive data than what you get from an in-memory programming environment.

A reporting database fits well when you have a lot of domain logic in a domain model or other in-memory code. The domain logic can be used to process updates to the operational data, but also to calculate derived data which to enrich the reporting database.


1: These days the desire seems to be for near-real time analytics. I'm skeptical of the value of this. Often when analyzing data trends you don't need to react right away, and your thinking improves when you give it time for a proper mulling. Reacting too quickly leads to a form of information panic, where you react badly to data that's changing too rapidly to get a proper picture of what's going on.


2004-04-02: original publication

2014-04-02: general update

2018-03-27: tweaked views paragraph to point out the ability to replicate view data (thanks to Matthias Hjalmarsson for the prod to do this).

if you found this article useful, please share it. I appreciate the feedback and encouragement

Find similar articles at these tags

database application architecture