The Agile Information Factory framework supports all new concepts in Information Management and delivers value by being Agile and by focusing on cost reduction. The key pillars of this framework are Data Vault 2.0 and Data Warehouse Automation that will be applied during our reference projects. The other concepts within the framework can be applied depending on the choice of the organization.
Data Warehouse Automation
The Data Warehouse Institute defines data warehouse automation as “using technology to gain efficiencies and improve effectiveness in data warehousing processes. Data warehouse automation is much more than simply automating the development process.
By choosing for Data Warehouse Automation you will reduce costs.
- Generate 80% to 90% code for the Foundation Layer (= Data Vault 2.0)
- Generate Testing code
- Automated release management
- Reduce overall cost with 40%
By choosing for Data Warehouse Automation your delivery will be much faster (agility).
Ensemble modeling contains:
- Data Recording – by recording all changes of history
- Resilience to change – the Data Vault model is designed to reduce the impact of model changes
- Extendable – Easy to extend without impacting the existing model
Instead of persisting your Presentation Layer (also called Data Marts) you can choose to virtualize it if your database environment allows it.
By virtualizing your Presentation Layer you can lower your total development cost of your Enterprise Data Warehouse by 15% because it needs much less development. Less development will of course result in faster delivery (being Agile).
The Data Lab or Sandbox area is has as goals :
- To reduce shadow IT : End-Users that will build their own Excels or Ms-Access based reporting.
- To supports one of Data Discovery exercises on structured or unstructured data.
- To supports Data Mining exercises.
Data inside the Sandbox can be combined from the data warehouse a Big Data “Data Pool” and operational data.
Area that keeps all data with the following characteristics
- High Volume – Terabytes, Petabytes, …
- High Velocity – Analysis of streaming data
- Variety – Different types of unstructured data
- Veracity – Uncertainty of data (Business Value, Quality ?) and low data value density
This Big Data area is based on the Hadoop File System (HDFS)
The usage of Big Data SQL to get an optimized access to the data and combine it real time with structured data.
Why can you choose for a cloud infrastructure?
- Reduce Cost
- Lower resource usage
- Easier Management
- Faster Lifecycle Management.
- Same hardware up to X times better troughput