/ tedamoh academy

Seminare,
Trainings & 
Workshops.

Ein neues Datenmodell für das Data Warehouse ist notwendig, aber wie modelliert man Daten? Temporale Daten verstehen und dazu Methoden und Techniken erlernen? Oder eine Zertifizierung zur Datenmodellierung?

Das alles bieten wir in unserer Academy an!

UPCOMING EVENTS

TEDAMOH Team - Stephan, Kim und Dirk

Leere Papierblätter für Notizen bei Seminaren in der TEDAMOH Academy

/ NEWS

Get the latest updates

Our latest updates of all categories - including our blog articles.

In July 2016 Mathias Brink and I had given a webinar how to implement Data Vault on a EXASOL database. Read more about in my previous blogpost or watch the recording on Youtube.

Afterward I became a lot of questions per our webinar. I’ll now answer all questions I got till today. If you have further more questions feel free to ask via my contact page,via Twitter, or write a comment right here.

Asked and answered

Question: Why are the Surrogate Keys implemented as DECIMAL(18,0) and not as INTEGER? Are there special performance or space consumption reasons?

Answer: No, none of them. It’s because DECIMAL(18,0) and INTEGER is EXASOL internally the same.

Q: Why is there no partitioning (distribution) on tables? Is there no need for a MPP partitioning strategy, no impact on query performance?

A: We didn’t use Distributions Keys for a) keeping the model more simple and b) the DDL is developed for the EXASolo (a one node system) which does not need Distribution Keys. Distribution Keys are important, when the EXASOL database has more than one node.

Q: Are there any consideration to do a time line partition on Satellites? Is there (multi-)partitioning at all available on EXASOL?

A: No, for simplicity we didn’t model any kind of partitioning. On the implementation site EXASOL does a lot self-optimizations and therefore there is no need for physical partitioning.

Q: Does EXASOL have temporal tables options?

A: No, not yet.

Q: You implemented physical Foreign Key constraints. Are there any impacts on load performance? With Teradata normally constraints are disabled.

A: We used enabled Foreign Key constraints due to data quality reasons. And no, we didn’t make tests onto differences of load performance with disabled or enabled Foreign Key constraints. As usual, you have the ability to disable Foreign Key constraints before load jobs will start and enable Foreign Key constraints again after data is populated.

Constraint Enabled: Will check if the value of a Foreign Key attribute is available in the referenced Primary Key.
Constraint Disabled: Like trusted or reliable Foreign Key constraints on other databases checks will not be performed on loading data.

If you don’t want to specify Enable/Disable on each ALTER TABLE statement there’s a global default available:

ALTER SYSTEM SET CONSTRAINT_STATE_DEFAULT = 'DISABLE';
ALTER SYSTEM SET CONSTRAINT_STATE_DEFAULT = 'ENABLE';

Q: Are there benchmark comparisons to other databases available?

A: No.

Q: So, what is the main difference between your two Data Vault schema? How is the optimized one different?

A: The two key points are 1) the reference tables and 2) the redesigned Link LNK_LINE_ITEM. With this changes you’ll get a vast performance "upgrade" for the TPC-H benchmark SQLs. If there’ll be the same behavior on other databases like EXASOL is not tested.

Q: On your Primary Key and Foreign Key constraints, seems it would be better to name them rather than have the sys_122345 names in the DDL. I assume you created them with no names then reverse engineered the DDL to end up with those names?

A: Correct. They were first created without names and therefore got a system number. From a modeling perspective, it’s a bad approach not to name them. I’ve updated the download files.

Q: Any reason that the Hub Alternate Keys (aka Business Keys) inherited to the Links are optional? Ditto on Hub HUB_PART and HUB_ORDER where the Alternate Keys are optional.

A: No, it’s an error in the model. Shouldn't be there. I’ve updated the download files.

Q: Did you record the load metrics anywhere? How long to load the Hubs and how many rows?

A: EXASOL does record this information in Log-Tables (Hint: Therefor auditing needs to be enabled on database level, follow the link.), e.g.:

SELECT *
FROM EXA_DBA_AUDIT_SQL
WHERE COMMAND_CLASS = 'DML'
  AND UPPER(SQL_TEXT) LIKE '%INSERT%HUB%';


So long,
Dirk

 

Keine Kommentare zu “Follow Up Data Vault EXASOL Webinar”

Kommentar hinterlassen

Als Antwort auf Some User
/ Testimonials

Feedback von unseren Kunden

Was Kunden über TEDAMOH erzählen!

MORE ABOUT US

Dirk und Team schaffen es mit diesem Training, ein Spezialistenwissen, durch ihre sympathische und professionelle Art in einer tollen Lern-Atmosphäre zu transportieren.

Meine Erfahrungen über das TEDAMOH Training „Temporale Datenmodellierung“ im Video festgehalten.

Weiterlesen ...

Dirk Lerner wurde damit beauftragt, mich bei der Erstellung eines Data Warehouse unter Verwendung der Data Vault 2.0 Architektur zu unterstützen.Es war ein Vergnügen, mit Dirk zu arbeiten.

Die Beratung hat die Qualität des Designs erheblich verbessert und den gesamten Prozess beschleunigt.

Weiterlesen ...

Dirk hat eine sehr gut strukturiertes Training über temporale Daten durchgeführt.

Sie hat mir sehr geholfen, mein Wissen über dieses anspruchsvolle Thema zu erweitern.

Weiterlesen ...

Ich war sehr zufrieden mit Dirk Lerner's "Temporal Data in a Fast-Changing World" Training.

Die Schulung beantwortete alle meine Fragen zur Bitemporalität, die in anderen Data Vault-Schulungen nicht so deutlich angesprochen wurden.

Weiterlesen ...