Data Vault 2.0 is one of our pillars for our Enterprise Data Warehouse projects being successful.
Data Vault 2.0 is not just modeling technique, it contains 3 areas : modeling, methodology and architecture.
The component parts for the Data Vault Ensemble include:
- HUB : natural business key
- LINK : natural business relationship
- SATELLITE : all context, descriptive data and history
Why Data Vault 2.0?
- Support the incremental build of an Enterprise Data Warehouse (EDW)
- Simplify & Accelerates the build of an EDW
- Unlimited level of parallelism
- No dependencies between any objects – All Hubs, Satellites and Links can be loaded in parallel
- Traceable – Auditable
- Audit id’s, Creation dates, etc… are standard parts of the model
- Model changes are also recorded in a change of history fashion so full auditability of the model over time is possible.
- Completeness (atomic, all historic data) – Data Recorder
- The standard approach is to store all data at the lowest level of granularity and record all history of change for all attributes.
- Resilient to change
- New Relationship to new entity (new product entity and a new relationship via orders to customer)
- Relationship cardinality change (from 1-1 to 1-N to N-M)
- New Source for same reference data (example customer)
- New Attributes
- Supports SQL and NoSQL(Hadoop & NoSql DB’s)
- Introduction of Hash Key
- Eliminates Joins and Lookups which are very heavy to execute on NoSQL environments
- Through hash bucket designation sharding is supported on MPP systems (Hadoop, Pivotal Greenplum, …)
Sharding is distributing data across multiple compute nodes