Last week, there was it again. A meeting of Data Vault geeks and interested people in Data Vault! And again in the wonderful area of Vermont, USA. But these year we hat +30°C compared to last year’s -20°C. In Stowe, Vermont, the conference was held in the wonderful Trapp Family Lodge. To give you some more insights I’ll embed some of the tweets. If you want read the full timeline, go here!
Right before the conference some of us did a “pre-workshop”, talking with Dan about special topics and some brand new stuff of Data Vault 2.0. One of this topics was a variation to deal with ghost records or orphans in SAT(ellites) / equi-joins. Others are Point in Time Tables in the Business Vault, Teaming in the Data Vault 2.0 environment and Managed Self Service BI (M.SS.BI).
On Day 1 Claudia Imhoff (founder of the Boulder BI Brain Trust or #BBBT) spoke in her keynote Unleash the power of analytics about traditional warehousing and extending data warehouse architectures with new “modules” like
- Real time analytics engine as a kind of source system or
- Including seamless external data or
- Data provisioning as data refinery (refine raw data in a data lake to valuable data)
Next, Dan Linstedt (Founder of the Data Vault Methodology) talked about Data Vault 2.0 in his presentation Big Data, NoSQL and Modeling. The key points were
- ETL (in the sense of tools) is dead, not needed anymore,
- Education is essential and
- Performance issues with Hadoop.
Furthermore, Dan presents an outstanding client case where a customer uses a really huge Data Vault 2.0 installation on Teradata in Down Under.
Kent Graziano (Data Warrior, Oracle Ace Director) shows us in his famous way how to implement Data Vault at a customer without being allowed either doing nor talking about Data Vault. Great presentation about Real world data warehousing, how to solve politics and finding new names for common Data Vault patterns.
Then, it was my turn about Temporal data warehouse and Data Vault. I’ll write a blogpost about it later.
Thanks to Dirk Schittko for his awesome insight into the German social system / charity organisations and the challenges to build an easy to maintain and low cost data warehouse with Data Vault 2.0 – “Business intelligence beyond graphs and tables. Dirk solved this problem in 1/5th of the time big companies estimated to do.
An amazing presentation and demo shows us Roelant Vos during his two talks about Allianz Global Assistance – Data Vault Case Study and New frontiers: Virtualize your EDW.
Incredible how Roelant virtualize a Data Vault out of a persistent staging area with metadata. Great. Roelant, I have to do it too!
Between the presentations (and in the evenings) we had enough time to network. It’s what makes the WWDVC different to other conferences. You can talk to everyone, talk with all this cutting edge data geeks and you are welcome to all. Like Sam Bendict wrote on LinkedIn WWDVC: Surrogate keys are not a solution to infertile Parent Keys.
`In spite of being on the IT leadership side of the equation, I always feel welcome and free to ask any question no matter how ‘Data Vault 101’ it may be’ – just an impressive bunch of people, all ready to share their knowledge and experiences.
Day 2 brought us some vendor presentations of
- Ultimate Software – Event-driven Rreal-time EDW in the cloud
- Wherescape – Wherscape solution for Data Vaults,
- MID - Model driven DV2 data warehouse – complete example and
- AnalyticsDS – AnalyticsDS – Mapping Manager.
It’s every time interesting how and why vendors implement Data Vault 2.0 in their tools.
Sanjay Pande states in his talk Agile Big Data warehousing with Data Vault 2.0:
Agile is about continuous improving, not just fast!
Furthermore he spoke about performance issues with hive and other tools in the Hadoop universe, how to extract data best out of Hadoop and recommended several tools. And finally made us a gift: His new book which he’s currently writing on it.
Beside Claudia Imhoff I met in person Scott W. Ambler (Father of Agile Modeling and Discipline Agile Development - DAD) for the first time. Amazing presentation about why to be and to do it the agile way, the cultural gap between development and data folks according to degree of maturity in agile development and database refactoring.
Database refactoring: Evolve the database schema by continuous development. Mark old stuff (schema changes) as deprecated and drop it sometime later. Very interesting stuff everyone should consider when doing agile!
Agile data modelling does not mean not to model. It is evolutionary data modelling. Model when you know what to model and when you need it according to the agile manifesto. Similar to my last blogpost Data Vault KISS - Keep it Small and Simple.
Summary of Scott Amblers talk: Data folks, think outside the box!
Second days dinner was sponsored by AnalyticsDS. Thanks to Sam Benedict for the awesome eve!
The last day of the WWDVC was all about crazy shirts. Have a look at the amazing tweets. It was a mixture of Big Bang Theory and Magnum. What a funny idea.
Kent spots us on why we should be virtual:
- Support agility
- Eliminates ETL bottlenecks
- No need for backups
One another interesting key point Kent brought on top is:
All presentation which included virtualisation uses more or less the same SQL. Why? Because Data Vault is pattern based.
Think about that.
Christian wrapped up how to use multi active SAT(ellites) as base for bitemporality in Data Vault. By using additional error SAT(ellites) you can create out of ugly temporal data well organised bitemporal SAT(ellites) and correct timelines in the sense of full, overlapping and condensed timelines.
Finally it was for me a great event! Tons of new ideas stuck in my mind during these days. And great in depth discussions with Kent, Roelant, Marcel and many more. Thanks to Dan for organizing this awesome event.
What remains are my personal “souvenirs” of the conference:
Relating modelling techniques
- Managed Self Service BI (M.SS.BI) means write back a lot of data into the business vault.
- Point in Time (PIT) tables are now part of the Business Vault.
- INNER Join in Data Vault 2.0 (New Option): Using Point in Time (PIT) tables to solve the INNER join challenges in Data Vault without using a full timeline. It’s an option to full timelines when volume of data is a performance issue.
With zero records, or ghost records, in SAT(ellites) you build full time rows in PITs only for all necessary and required business keys improving query speed on virtualized SCDs. With this technique only one ghost record in each SAT(elite) is needed compared to full timeline SAT(ellites) which need for each business key a ghost record.
Both techniques are valid options. It depends of concerns in specific cases.
- Using virtualisation and insert only SAT(ellites) due to simplify portioning and backups of your warehouse. Saves huge amount of data to backup and only new data will be backuped.
- We, the data folks, are almost 20 years behind the software guys in adopting agile techniques. A lot of work is to do.
- Agile: Not only time boxes. Doing agile in a mature way means deliver value in a continuous way to production.
- Deprecate “old” data model parts for change management. So there’s no need to refactor immediately all succeeding apps due to data model changes.
- Crazy shirt contest makes a conference unique and easy doing
- Write all my future blogpost in English
Some more impressions: