At work I just started at a new customer and was part of the data warehouse team, who was assigned the task of building the Data Vault Data Warehouse. We were developing the Data Vault generator. At that point we used an end-date for the business time. I remember the complexity for updating the old records and the time for the server it costs to do this. Especially when you want to add rows in between. To load history, for example, from old sources. The training is led by Dirk in an interactive way.
The first session at Data Modeling Zone Europe 2018 in Düsseldorf, was a session about bitemporal data by Dirk Lerner. The session was and is an extract from his current training Temporal Data in a Fast-Changing World, which is now worldwide available - in English and German - as open and in-house training.
With very small groups of two or three we did some exercises about timelines and Allen relationships. He told the difference between technical timeline and the business timeline. Dirk also supplied some nice synonyms for this. At the project we had sometimes communication problems about those two. So, when back at the office, I immediately introduced a name change for the business time. We have chosen for state_date for the business date and load_date for the assertion time (moment that the data arrived at the data warehouse).
The theory was also very clearly explained by Dirk, who introduced us into the world of Allen relationships and the terms for the different temporals that exists, like nontemporal, unitemporal and bitemporal. With his presentation in hand, I was able to convince my team to change our approach. The theory of Dirk helped me to introduce an insert-only architecture. With the exercises I did, I stand strong in the discussion with the team to get rid of the end-date columns. This column can be calculated, so it is yet not necessary.
Later on, in the project this decision seemed to be key in loading very old data into our data warehouse. We had to load 120 backup-databases into the data warehouse. I was assigned with this task, which took a lot of work, due to the little differences in the schemata between the different backups, I already had a lot to do, getting the data right and uploading it into the data warehouse. The latter was a lot easier, because of the insert-only architecture we used. We saved 1.5 Terabyte of expensive disk space and reduced the data warehouse to 400 Gigabytes. The old backups could be deleted, and we can access all the data, all the time. Before that change, the customer just reloaded a backup when needed.
The session from Dirk at DMZ Europe helped me to understand temporal correctly and helped me to explain the concepts to business analysts and data specialists. I used the examples from Dirk to show them the difference between NowNow, ThenThen and ThenNow. Still it is difficult to get my head around, but at least I feel safe with the knowledge acquired at Dirk’s bitemporal session.
Finally, I can recommend Dirk's training without doubt. If you are thinking about bitemporal data in your data warehouse or if you have problems with it, you should definitely contact Dirk.
Tijs van Rinsum, Qvada
Author of the book De Data Gastronoom: It’s a story about data no technical thing. For managers and people outside our space. Two guys in a restaurant talking about data topics. Like how to deal with history. What about scrum. And more. It’s a readable book for anyone.