- Traditionally data is thought of as coming from well
organized databases with controlled schemas sporting strong
validation conditions.
- But we are now seeing data in many forms: log files, message
queues, spreadsheets. This data is scattered throughout an
organization and its ecosystem.
- There is often little or no
schema to control its structure.
- Data is often non-uniform, with each element having
different properties.
- With multiple sources of data, crowd sourcing and even
automated inferencing and discovery of data - there are big
problems with data quality.