Automated Data Provisioning Framework – Release 1

I’m releasing my automated development framework for data provisioning onto GitHub. I’m doing it in stages just to make it more manageable for myself. Why am I doing this? because I like coding, building things and maybe someone will get some value from it.

Setting Expectation

 

To use it expect to have or bulid a level of knowledge with the following skills:

  • Data Warehouse, ETL & ELT Architectural Design Patterns
  • SSIS
  • SQL Server
  • C#
  • BIML Express
  • T4 Templates

It’s a development framework for techies and whilst it is setup ready to go with examples with all  projects there are always subtle design differences that will require configuration tweaks and or extensions. The aim of the framework is tailored code re-use thus:

  • Saving many (in fact rather a lot of) man hours
  • Provide a flexible framework
  • Provide an agile framework – steam ahead and don’t worry having to rework stuff
  • Provide robust and high quality deliverable’s with less human error
  • Don’t waste time on level plumbing and allow the team to focus on the difficult bits – e.g. data integration & BI transforms

It is not a tool for someone with no knowledge, experience or requirements to create an off the shelf MI platform. I’ve spent a long time delivering MI platforms and in my humble experience every project has subtle differences that will make or break it, hence a highly flexible and agile framework is the way to go. Trying to shoe horn specific requirements into generic solution or even worse, data into a generic data model never leads to happiness for anyone.

I’ll assist as much as possible (if asked) to help folks understand and make use of the assets.

Release 1

 

This release focuses on the core assets for delivering a simple bulk loaded stage layer in less than 2 minutes with full a meta data repository and ETL with data lineage and logging. In this release:

  • Metadata management repository
  • Metadata SQL Server scrapers to automatically fill the repository and map data flows at attribute level
  • Automated DDL creation of database tables
  • Automated ETL creation of OLEDB bulk load packages
  • .Net assembly to manage BIML integration with metadata repository

 

Framework Stage

It’s set up to use adventure works and can very quickly be changed to use any other SQL Server database(s) as source databases. This is because the metadata is scraped automatically from SQL Server. As the framework is extended I’ll add other source scrapers.

As it turns out Adventure Works was a good database to use because it uses all of the SQL Server datatypes and some custom data types too.

Release n

 

There’s loads more to add that will come in further releases. This is my initial list:

  • Patterns for loading other layers – probably the DW layer initially
  • MDS integration for metadata repository
  • Other stage loading BIML templates for MDS, Incremental Loads, CDC Loads
  • Automated stage indexing
  • Staging archive & retrieval
  • Meta scrapers to support other data source types
  • Tools to help generate meta data for flat files
  • Isolated test framework for loading patterns
  • Data lineage, dictionary, metadata and processing reports
  • Statistical process control – track and predict loading performance

The Good Stuff

 

I don’t want to procrastinate over documentation too much but will flesh out more detail as and when I can. Onto the good stuff.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s