ADP Framework: Getting Started

This is the getting started guide for the Automated Data Provisioning (ADP) Framework using BIML, SSIS and SQL; Scroll to the bottom to see a growing list of detailed docs.

Much of the what is referred to like database names is configurable. However the code currently in the GitHub repo will work immediately without any extra tweaking. I recommend following this to the letter before attempting to reconfigure the framework just to build familiarity.

Recommended Tools

Visual Studio 2017 SSDT now has support for SSIS. However we still currently need VS installs for both 2015 and 2017 because BIML Express doesn’t have an installer yet for Visual Studio 2017. I recommend installing Visual Studio 2017 for the full blown development experience and Visual Studio 2015 SQL Server Data Tool (SSDT) for developing SSIS. Since it keeps the disk footprint as light as possible.

You can also install Visual Studio 2017 SSDT if you wish but you won’t be able to use BIML Express with it until Varigence release an installer.

Step 1 – Create Databases


Step 1

Step 2 – Build Semantic Weave


  • Open the semantic.weave solution in Visual Studio 2017
  • Review the connection string in SemanticInsight.Weave.dll.Config and ensure it points at your SSISDB
  • Rebuild the solution
  • Publish the SSISDB project to the SSISDB database; ensure the database deploys successfully and that the SemanticInsight schema tables and procs exist in the database – see images below
  • Navigate to the bin folder for the Semantic.Weave project and get the path since we’re going to need to reference it in our BIML scripts later e.g

step 2

Step 2 a

Step 3 – Reconfigure

NOTE: The global file generally works with biml script without having to select it. Currently however, it has to be selected because we’re using to house our assembly reference in a single place.

  • Open the BIML solution in Visual Studio 2015
  • The “1 – ProjectConnections.biml” in both projects BIML ETL and BIML Utility contains the environment connections that will be used to create the DB connections. Open this file and review the connection strings to ensure they are correct for each database:
    • SSISDB
    • Stage
    • AdventureWorks
  • You’ll notice the connections also have annotation for GUID’s. This is so we can control and ensure the GUIDs are consistent when connections are created. If you want to generate new GUID’s this can be by simply running the following in SQL Server Management Studio
SELECT newid()

step 3

  • Now open the following “0 – Global.biml” and ensure the Assembly Name reference points to where the Semantic.Weave.dll is on your machine. See Step 2. Ensure to escape replace the backslashes in the path using double backslashes.


Step 4 – Scrape Adventure Works Meta Data


  • Press ctrl and select the following 2 files in the BIML Utility project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they are named with a number to make it more intuitive:
    • 1 – ProjectConnections.biml
    • 2 – MetaDataScrape.SQLServer.ADWorks.biml
  • Execute the package called Sssidb_SqlServerScrape_AdventureWorksBI_AdventureWorks.dtsx that is created


The BIML script created a package and 2 project connections. This package does 2 things:

1. It executes the the stored procedure [semanticinsight].[configure_system_component]. This procedure populates the table [semanticinsight].[system_component] that defines the solution component architecture in a parent child structure. e.g. Source Databases, Stage Databases, etc. This procedure is an inflection point to building your own solution. Use this procedure to place logic to populate the component structure for your own solution.

select * from [semanticinsight].[system_component]


[semanticinsight].[configure_system_component] also populates the system component schema’s and maps them together for data loading. How you configure these tables will subsequently affect how meta data is mapped, what database objects are created and what load packages are created.

select * from [semanticinsight].[data_schema]
select * from [semanticinsight].[data_schema_mapping]


2. The package queries the AdventureWorks for schema information and loads that data into the SSISDB meta data repository tables. You can check this by looking at the data in the tables:

select * from [semanticinsight].[data_object]
select * from [semanticinsight].[data_attribute]


Step 5 – Build the Stage Database


  • Press ctrl and select the following 3 files in the BIML Utility project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they are named with a number to make it more intuitive:
    • 0 – Global.biml
    • 1 – ProjectConnections.biml
    • 2 – StageTableDefinitions.ADWorks.biml
    • 3 – CreateTable.biml
  • Execute the package called CreateTables_Stage.dtsx that is created
  • Check the stage data base for newly created tables and schemas



The StageTableDefinitions.ADWorks.biml script uses the meta data and component schema mappings created in step 4 to provide target table definitions for the stage database. The

Step 6 – Scrape Stage Meta Data

  • Press ctrl and select the following 2 files in the BIML Utility project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they are named with a number to make it more intuitive:
    • 1 – ProjectConnections.biml
    • 2 – MetaDataScrape.SQLServer.ADWorksStage.biml
  • Execute the package called Sssidb_SqlServerScrape_AdventureWorksBI_AdventureWorksStage.dtsx that is created

This package works in much the same way as the explanation in step however this time we’ve scraped the schema information from the Stage database tables that we created in the Step 5.


Step 7 – Create Packages & Map Meta Data


  • Press ctrl and select the following 3 files in the BIML ETL project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they named with a number to make it more intuitive:
    • 0 – Global.biml
    • 1 – ProjectConnections.biml
    • 2 – Table.Stage.ADWorks.biml
    • 3 – Package.OLEDBBulk.Single.biml


If you’ve completed all the steps to here you’ve just automatically created the stage load of 88 tables and nearly a 1000 columns. The package will have events embedded in to capture row stats, operational logging and data lineage.

If you’d prefer to have 1 package per table load repeat this step but use the 3 – Package.OLEDBBulk.Multiple.biml instead of the 3 – Package.OLEDBBulk.Single.biml. This will create 88 packages 1 for each table.


You can then repeat the step again with the 3 – Package.OLEDBBulk.Multiple.Master.biml and this will create a master package that executes all of the child load packages.


As the packages are created the BIML script calls the [semanticinsight].[]map_data_attributes] procedure using Semantic.Weave that maps the meta data together in the meta data repository exactly as the data is provisioned by the load packages. This mapping id is then tied into the package execution logging giving a fully integrated meta data repo, logging and data lineage.

select * from [semanticinsight].[data_object_mapping]
select * from [semanticinsight].[data_attribute_mapping]


Step 8 – Load Stage


  • Execute either the Component_OLEDBBulk.dtsx or the Master Adventure Works Stage.dtsx if you’re using the multple package approach.


  • Review the logging information e.g. please note the id’s will obviously be different in your implementation change them as required. Note that the execution_id in the semanticinsight.execution_id is the SSIS server execution id. This is captured so SSIS extension logging can be tied to the native SSISDB logging when required. It will default to 0 when run locally and not on the SSIS server.

select * from Stage.Production.Product

select * from [semanticinsight].[process] where process_id = 1
select * from semanticinsight.data_object_mapping where data_object_mapping_id = 67
select * from semanticinsight.process_data_object_stats s where s.process_id = 1 and s.data_object_mapping_id = 67


Other Docs




Automated Data Provisioning Framework – Release 1

I’m releasing my automated development framework for data provisioning onto GitHub. I’m doing it in stages just to make it more manageable for myself. Why am I doing this? because I like coding, building things and maybe someone will get some value from it.

Setting Expectation


To use it expect to have or bulid a level of knowledge with the following skills:

  • Data Warehouse, ETL & ELT Architectural Design Patterns
  • SSIS
  • SQL Server
  • C#
  • BIML Express
  • T4 Templates

It’s a development framework for techies and whilst it is setup ready to go with examples with all  projects there are always subtle design differences that will require configuration tweaks and or extensions. The aim of the framework is tailored code re-use thus:

  • Saving many (in fact rather a lot of) man hours
  • Provide a flexible framework
  • Provide an agile framework – steam ahead and don’t worry having to rework stuff
  • Provide robust and high quality deliverable’s with less human error
  • Don’t waste time on level plumbing and allow the team to focus on the difficult bits – e.g. data integration & BI transforms

It is not a tool for someone with no knowledge, experience or requirements to create an off the shelf MI platform. I’ve spent a long time delivering MI platforms and in my humble experience every project has subtle differences that will make or break it, hence a highly flexible and agile framework is the way to go. Trying to shoe horn specific requirements into generic solution or even worse, data into a generic data model never leads to happiness for anyone.

I’ll assist as much as possible (if asked) to help folks understand and make use of the assets.

Release 1


This release focuses on the core assets for delivering a simple bulk loaded stage layer in less than 2 minutes with full a meta data repository and ETL with data lineage and logging. In this release:

  • Metadata management repository
  • Metadata SQL Server scrapers to automatically fill the repository and map data flows at attribute level
  • Automated DDL creation of database tables
  • Automated ETL creation of OLEDB bulk load packages
  • .Net assembly to manage BIML integration with metadata repository


Framework Stage

It’s set up to use adventure works and can very quickly be changed to use any other SQL Server database(s) as source databases. This is because the metadata is scraped automatically from SQL Server. As the framework is extended I’ll add other source scrapers.

As it turns out Adventure Works was a good database to use because it uses all of the SQL Server datatypes and some custom data types too.

Release n


There’s loads more to add that will come in further releases. This is my initial list:

  • Patterns for loading other layers – probably the DW layer initially
  • MDS integration for metadata repository
  • Other stage loading BIML templates for MDS, Incremental Loads, CDC Loads
  • Automated stage indexing
  • Staging archive & retrieval
  • Meta scrapers to support other data source types
  • Tools to help generate meta data for flat files
  • Isolated test framework for loading patterns
  • Data lineage, dictionary, metadata and processing reports
  • Statistical process control – track and predict loading performance

The Good Stuff


I don’t want to procrastinate over documentation too much but will flesh out more detail as and when I can. Onto the good stuff.


The Basics – BIML: Project Connection GUIDs

Was in the process of just updating my framework to get it released and documented and came across a slight issue with  my code and Project Connection GUID’s. Am not sure if something has changed since I last used it but I’m sure I didn’t have this issue before but nevertheless.

The Problem


Right… if you’ve been doing SSIS a while then you’ll know connections are hooked up to tasks using a unique identifier that is a GUID. It doesn’t use the connection name which I’ll be honest for practical purposes is pretty annoying sometimes. If you re-create the connection you have to edit all the package tasks that use the connection because it has a different GUID. Well with BIML that ain’t so bad because you can just re-create all the packages quite easily

However… I do get the issue where I have a BIML script that has a code nugget that creates many packages. However on each iteration for each package it creates new connections whether I choose to overwrite them or not this means only one of my packages (the last one) has the connections hooked up properly. Also I may create assets in a few different hits that use the same connections and it’s just to easy to break the GUID’s by re-creating the connections.

The Solution


The solution I’ve settled on is to force the connection GUID’s. The reason being GUID’s constantly being dynamic and out of sync is annoying and it suits me better to have explicit control of what they are. At the <BIML> root the <connection> node doesn’t allow the definition of a GUID or Id. To get around this I’ve used an annotation and then used the annotation in a code nugget in the <package> node to force the GUID’s

This suits my purpose for my framework because I don’t want to repeat code. I can basically have 1 single BIML file that is my global connections definition for all my other BIML scripts that depend on those connections.

So for a very simple example we 2 files:

  1. ProjectConnections.biml – defines my connections
  2. Package.biml – defines my package that uses the connection



<Biml xmlns="">

 <Connection Name="Adventure Works" 
 ConnectionString="Data Source=.;Initial Catalog=AdventureWorks2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
 <Annotation AnnotationType="Tag" Tag="GUID">0284217D-A653-4400-87F3-529A569B8F05</Annotation> 
 <Connection Name="Stage" 
 ConnectionString="Data Source=.;Initial Catalog=Stage;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
 <Annotation AnnotationType="Tag" Tag="GUID">46A16E16-E57A-4D98-82CA-13AB3F011980</Annotation> 
 <Connection Name="SSISDB" 
 ConnectionString="Data Source=.;Initial Catalog=SSISDB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
 <Annotation AnnotationType="Tag" Tag="GUID">CF98B9CE-7D19-4493-9E33-1026C688F874</Annotation> 



<#@ template language="C#" #>
<Biml xmlns=""> 
        <Package Name="Package Connections 1">
    <ExecuteSQL Name="Test" ConnectionName="SSISDB" ResultSet="None">
                  SELECT TOP 0 1
            <# foreach(var connection in RootNode.Connections) {#> 
            <Connection ConnectionName="<#=connection.Name#>" Id="<#=connection.GetTag("GUID")#>">
            </Connection> <#}#> 

        <Package Name="Package Connections 2"> 
    <ExecuteSQL Name="Test" ConnectionName="SSISDB" ResultSet="None">
                  SELECT TOP 0 1
            <# foreach(var connection in RootNode.Connections) {#> 
            <Connection ConnectionName="<#=connection.Name#>" Id="<#=connection.GetTag("GUID")#>">
            </Connection> <#}#> 

With BIML express we can select ProjectConnections.biml then Package.biml, right click, hit Generate SSIS Packages and everything is all good. The cool thing is that if you’ve already created the connections as part of the framework build then it doesn’t matter if you overwrite or not when you get message below because the GUID’s are explicitly defined.

connections overwrite

Wrap Up


This is a very basic example of the pattern I’m using in my framework – which I am trying to get out but just have to update and polish a few things. Fundamentally I’m forcing the project connection GUID’s using annotations. I did look for an Id property on the BIML connection object so I don’t have to use annotations but couldn’t find one. I may have overlooked it during my haste and if there is one I’m sure someone will correct me.

There’s a bit more to it when you start handling table definitions and I’ve used linq to sub-select the connections I need for a particular package templates. These more sophisticated examples will be in my framework code.

I do have a metadata repo and you could declare the connections in the repo and control the creation from there using custom C# assembly. However I really don’t like re-producing features that SSIS already has. It already has a repo for connections and environments called SSISDB and my framework extends SSISDB rather than overlapping it which I much prefer since there is no confusion or complexity in the end product of how environments are configured and administrated. I don’t see the point or creating features that SSIS already has and that can used within SSMS.

Automating DW Delivery – T4 Templates & BIML

The awareness of BIML has improved somewhat in the last few years but still very little is known about T4 Templates which is the topic of this post.

Just a bit of fundamentals. Effectively data ETL, ELT and data provisioning platforms are predominantly schema typed and this isn’t going away whilst we’re on silicon chips. The primary reason for this is performance! Even in MPP architectures you’re going to hit a strongly typed schema at some point; you can’t escape the entropy of information.

The Good


SSIS has to size a memory pipeline for it’s given data feed. It does this so it can shovel the data through fast distributed or not; and just to mention the next version of SSIS can be distributed on multiple servers (perhaps we’ll see in Azure 1 day). SSIS provides a hugely flexible and re-usable GUI to pretty much build any workflow and data feed your abilities will allow; so what’s the rub?

The Bad


Each SSIS executable you create has specific use and definition of metadata, with re-usable but very low level features that are click-tastic enough to give you RSI after one large project. It’s labour intensive to the max and subsequently error prone.

Designers cannot separate the metadata definitions from re-usable implementation patterns


This is a real pain… Why can’t I just build a template SSIS package dimension load and fire in my metadata definition for all my dimensions and implement them all with a single click of a mouse getting all the re-use of fully tested code that any sane software engineer would expect? Well you can but not with VS alone; enter BIML



In short BIML (Business Intelligence Markup Language) returns SSIS back to a respectable coding experience where I can define my template SSIS patterns using readable XML, hook up a central metadata repository using C# and generate all my DW feeds. Now we’re talking… proper code re-use delivering specific and fast SSIS executables.

There’s a ton of info on BIML already, if this is new to you stop here and go learn BIML! I have a couple of starter vids and there’s loads of other tech writers on the topic:


T4 Templates


So what’s this about T4 Templates then?

Well BIML is only free to a point (BIML Express). It does have a fully blown IDE (Mist) if you have the bucks. Also a lot of good SSIS patterns for some of the more upstream transformations require relatively complicated stored procedures that can’t be created using BIML templates. You can of course write your own metadata SQL wrapper using C# which is a fair amount of leg work to say the least – it’s a long road I have been on it.

Another option is dynamic SQL – which in my humble opinion are 2 words that should never go together! Why? Because SQL is terrible for text parsing / code generation and all the other reasons that are just too upsetting to list.

Or… you can use T4 Templates!

T4 Templates have been in Visual Studio since 2005. They’re basically a similar concept to BIML except the language can just be any text – not just BIML. I can basically template out any other language using C# to dynamically push in metadata in very much the same way that BIML works. I’m not just limited to a specific language either it could be used to create SQL procedures, Azure Data Factory json pipelines, Azure Data Lake jobs or C#.

It was included in Visual Studio specifically for templating Code Generation for design time or run time. It’s used by visual studio itself to generate code from UI designers such as Entity Framework for example. T4 comes from the abbreviation of:

  • Text – dynamically creates text output using template files
  • Template – a combination of text blocks & control logic written in C# or
  • Transformation – transforms the text in executable code and executes to produce the final output
  • Toolkit – a set of assemblies packaged into Visual Studio

There are 2 types of Templates that can be used:

  • Design Time – Templates are executed at design to generate code
  • Run Time – Executed when the application executes are compiled into classes in the application. They can receive parameters and sit within control logic.

Using T4 Instead of BIML


It’s a non-starter, forget it! Whilst you could script out SSIS packages using the native XML it’s just not (that) human readable. BIML brings the mark-up language to party which is more human readable by a country mile. You’d be mad to try to script out native SSIS XML whilst BIML express is available for free.

Design or Run Time


On the topic of code generating code. Should we be generating and executing code during design time or run time? I’ve seen various flavours of frameworks that do both or somewhere in between.

I’m a firm believer that code generation should only occur during design time or more specifically during a development process. The development process has all the bells and whistles required to help manage risk during a code generation exercise e.g. accidentally loading live data to an unsecured area of access or loading DEV data into live not to mention all the other bugs and failures that could occur.

Do we really want dynamic code executing in live that has never run before? Also debugging and issue resolution is an enormous pain in the neck if the code that ran isn’t available, easy to isolate and debug – dynamic run time frameworks written in anger tend to be low on support features and over complicated!

Also the arguments for dynamic run time seem to be around circumventing bureaucratic change control that was put in place because of a poor development process. The solution to a robust agile BI development isn’t slipping in a cheeky back door to make clandestine changes it is in fact continuous integration which is a whole other detailed topic.

To use T4 templates with BIML frameworks we use the run time execution type but during the development process since we can call the classes using C# that BIML executes during package creation. So in that respect they execute at the run time of BIML execution not the run time of the warehouse load.



I’ve still yet to come across a team fully automating the delivery of a data warehouse using meta-data. The tools are available (and free) so I’m not sure what the barriers are.

I’ll start dumping my BIML and T4 Template framework assets into Git when I get chance and hopefully show T4 working in unison with BIML to do more than just load a basic staging database layer.