I’m releasing my automated development framework for data provisioning onto GitHub. I’m doing it in stages just to make it more manageable for myself. Why am I doing this? because I like coding, building things and maybe someone will get some value from it.
To use it expect to have or bulid a level of knowledge with the following skills:
- Data Warehouse, ETL & ELT Architectural Design Patterns
- SQL Server
- BIML Express
- T4 Templates
It’s a development framework for techies and whilst it is setup ready to go with examples with all projects there are always subtle design differences that will require configuration tweaks and or extensions. The aim of the framework is tailored code re-use thus:
- Saving many (in fact rather a lot of) man hours
- Provide a flexible framework
- Provide an agile framework – steam ahead and don’t worry having to rework stuff
- Provide robust and high quality deliverable’s with less human error
- Don’t waste time on level plumbing and allow the team to focus on the difficult bits – e.g. data integration & BI transforms
It is not a tool for someone with no knowledge, experience or requirements to create an off the shelf MI platform. I’ve spent a long time delivering MI platforms and in my humble experience every project has subtle differences that will make or break it, hence a highly flexible and agile framework is the way to go. Trying to shoe horn specific requirements into generic solution or even worse, data into a generic data model never leads to happiness for anyone.
I’ll assist as much as possible (if asked) to help folks understand and make use of the assets.
This release focuses on the core assets for delivering a simple bulk loaded stage layer in less than 2 minutes with full a meta data repository and ETL with data lineage and logging. In this release:
- Metadata management repository
- Metadata SQL Server scrapers to automatically fill the repository and map data flows at attribute level
- Automated DDL creation of database tables
- Automated ETL creation of OLEDB bulk load packages
- .Net assembly to manage BIML integration with metadata repository
It’s set up to use adventure works and can very quickly be changed to use any other SQL Server database(s) as source databases. This is because the metadata is scraped automatically from SQL Server. As the framework is extended I’ll add other source scrapers.
As it turns out Adventure Works was a good database to use because it uses all of the SQL Server datatypes and some custom data types too.
There’s loads more to add that will come in further releases. This is my initial list:
- Patterns for loading other layers – probably the DW layer initially
- MDS integration for metadata repository
- Other stage loading BIML templates for MDS, Incremental Loads, CDC Loads
- Automated stage indexing
- Staging archive & retrieval
- Meta scrapers to support other data source types
- Tools to help generate meta data for flat files
- Isolated test framework for loading patterns
- Data lineage, dictionary, metadata and processing reports
- Statistical process control – track and predict loading performance
The Good Stuff
I don’t want to procrastinate over documentation too much but will flesh out more detail as and when I can. Onto the good stuff.