This is documentation for Schema and Object meta data mappings for the Automated Data Provisioning (ADP) Framework using BIML, SSIS and SQL. The getting started documentation can be found here.
The ADP Framework has a meta data repository for meta data and how data transfer is mapped across the meta data which is ultimately used to generate data loads and provide data lineage logging. The meta data repository is a bunch of tables that are created as an extension in the SSISDB. This blog documents these tables and describes how they are intended to be used.
Here is the diagram:
Data Object Tables
semanticinsight.system_component – This holds details and self referencing mappings of logical system components.
It’s common for data provision platform to be implemented in a hierarchy of system components such as data sources, stage databases, data marts ODS’s, data vaults and/or data warehouses. Sometimes logical sub-groupings are required in order to meet load provisioning and dependency requirements. The table is designed and intended to store components that may be a simple logical grouping or logically represent a physical component e.g. database, file share, blob storage or data lake store. Currently the framework is setup for traditional RDBMS data provisioning but the intention is to extend it for other nosql system components such as file shares, data lakes, etc.
semanticinsight.data_object – This holds details of objects that are at table level e.g. tables, views and procedures. It also holds details about how the data is formatted and should be loaded.
semanticinsight.data_schema – Data objects may be further classified into logical groups for security, maintenance and logical readability. Currently this table isn’t fully de-normalised and also holds the database name. This is for convenience since this table is intended to be configured for the solution and there is no front end for database as of yet.
semanticinsight.data_object_type – Defines what type a data object can be. Currently it can only be a Table, View and StoredProcedure.
semanticinsight.data_attribute – Defines the columns or attributes that data object can have and also the their data type constraints.
Data Load Mapping Tables
These tables hold details about how the meta data is mapped into the data provisioning solution.
semanticinsight.data_schema_mapping – maps data flow from source system component schema to another.
semanticinsight.data_object_mapping – maps data flow from a source schema data object to another.
semanticinsight.data_attribute_mapping – You’ve guessed it; maps data flow from a source data object attribute to another.
The framework and solution does not allow breaking the hierarchy i.e. sharing objects or attributes across schemas and databases. This is by design because I hate spaghetti data platforms – solutions should have clean logical layers. A skill in building data platforms is providing simple solutions to complex problems.
The database designers amongst us may notice that the higher level mappings of data objects and data schemas could just be implied by the data attribute mapping which is the lowest level mapping. Mappings at multiple levels is a very deliberate design decision.
The objective is to automate delivery but we need to give the framework some high level config to go on. This is what the following tables are for that should be manually configured for a specific solution. These tables should can be populated by modifying the stored procedure called semanticinsight.configure_system_component which is called in the meta data BIML scripts provided:
The following tables can be automatically populated and mapped with careful design by the framework which saves us a lot of time since data objects and their attributes can run into their 1000’s.
This is the demo and getting started setup that the GitHub project comes with. It should be fairly intuitive.
Basically it shows we have 2 system components grouped into a solution called “Adventure Works BI”. System components must have a root node. The table has a relationship for it’s parent system component. It also has a relationship directly to the root component for any system component which I added and found it made my coding life a lot easier for querying the data that is needed.
We can see in the schema and schema mapping tables that the 2 databases Adventure Works and Stage are mapped together across schema’s with identical names. This is not mandatory and schema’s can be mapped as required and the automated development framework will create the data objects in the schema’s as described in this table.
Here is another example. This example might be relevant if we have multiple operational source databases in multiple geo-graphic regions loading into a single stage database for further integration. In this case we can’t use like for like schema names because the table names will clash. We could use the table names to make the distinction but for many reasons I won’t go into here (the biggest one code re-use) it’s better to keep common table names and use the schema name to make the distinction.