ADP Framework: Getting Started

This is the getting started guide for the Automated Data Provisioning (ADP) Framework using BIML, SSIS and SQL; Scroll to the bottom to see a growing list of detailed docs.

Much of the what is referred to like database names is configurable. However the code currently in the GitHub repo will work immediately without any extra tweaking. I recommend following this to the letter before attempting to reconfigure the framework just to build familiarity.

Recommended Tools

Visual Studio 2017 currently has no support for SSIS so we currently need installs for both 2015 and 2017 which is pain because they’re fairly hefty installs but necessary until Microsoft fully release SSDT for VS 2017.

Step 1 – Create Databases

 

Step 1

Step 2 – Build Semantic Weave

 

  • Open the semantic.weave solution in Visual Studio 2017
  • Review the connection string in SemanticInsight.Weave.dll.Config and ensure it points at your SSISDB
  • Rebuild the solution
  • Publish the SSISDB project to the SSISDB database; ensure the database deploys successfully and that the SemanticInsight schema tables and procs exist in the database – see images below
  • Navigate to the bin folder for the Semantic.Weave project and get the path since we’re going to need to reference it in our BIML scripts later e.g
C:\github\SemanticInsight.Weave\SemanticInsight.Weave\bin\Debug\SemanticInsight.Weave.dll

step 2

Step 2 a

Step 3 – Reconfigure

NOTE: The global file generally works with biml script without having to select it. Currently however, it has to be selected because we’re using to house our assembly reference in a single place.

  • Open the BIML solution in Visual Studio 2015
  • The “1 – ProjectConnections.biml” in both projects BIML ETL and BIML Utility contains the environment connections that will be used to create the DB connections. Open this file and review the connection strings to ensure they are correct for each database:
    • SSISDB
    • Stage
    • AdventureWorks
  • You’ll notice the connections also have annotation for GUID’s. This is so we can control and ensure the GUIDs are consistent when connections are created. If you want to generate new GUID’s this can be by simply running the following in SQL Server Management Studio
SELECT newid()

step 3

  • Now open the following “0 – Global.biml” and ensure the Assembly Name reference points to where the Semantic.Weave.dll is on your machine. See Step 2. Ensure to escape replace the backslashes in the path using double backslashes.

step3a

Step 4 – Scrape Adventure Works Meta Data

 

  • Press ctrl and select the following 2 files in the BIML Utility project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they are named with a number to make it more intuitive:
    • 1 – ProjectConnections.biml
    • 2 – MetaDataScrape.SQLServer.ADWorks.biml
  • Execute the package called Sssidb_SqlServerScrape_AdventureWorksBI_AdventureWorks.dtsx that is created

step4a

The BIML script created a package and 2 project connections. This package does 2 things:

1. It executes the the stored procedure [semanticinsight].[configure_system_component]. This procedure populates the table [semanticinsight].[system_component] that defines the solution component architecture in a parent child structure. e.g. Source Databases, Stage Databases, etc. This procedure is an inflection point to building your own solution. Use this procedure to place logic to populate the component structure for your own solution.

select * from [semanticinsight].[system_component]

step4a

[semanticinsight].[configure_system_component] also populates the system component schema’s and maps them together for data loading. How you configure these tables will subsequently affect how meta data is mapped, what database objects are created and what load packages are created.

select * from [semanticinsight].[data_schema]
select * from [semanticinsight].[data_schema_mapping]

step4b

2. The package queries the AdventureWorks for schema information and loads that data into the SSISDB meta data repository tables. You can check this by looking at the data in the tables:

select * from [semanticinsight].[data_object]
select * from [semanticinsight].[data_attribute]

step4c

Step 5 – Build the Stage Database

 

  • Press ctrl and select the following 3 files in the BIML Utility project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they are named with a number to make it more intuitive:
    • 0 – Global.biml
    • 1 – ProjectConnections.biml
    • 2 – StageTableDefinitions.ADWorks.biml
    • 3 – CreateTable.biml
  • Execute the package called CreateTables_Stage.dtsx that is created
  • Check the stage data base for newly created tables and schemas

 

step5b

The StageTableDefinitions.ADWorks.biml script uses the meta data and component schema mappings created in step 4 to provide target table definitions for the stage database. The

Step 6 – Scrape Stage Meta Data

  • Press ctrl and select the following 2 files in the BIML Utility project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they are named with a number to make it more intuitive:
    • 1 – ProjectConnections.biml
    • 2 – MetaDataScrape.SQLServer.ADWorksStage.biml
  • Execute the package called Sssidb_SqlServerScrape_AdventureWorksBI_AdventureWorksStage.dtsx that is created

This package works in much the same way as the explanation in step however this time we’ve scraped the schema information from the Stage database tables that we created in the Step 5.

step6

Step 7 – Create Packages & Map Meta Data

 

  • Press ctrl and select the following 3 files in the BIML ETL project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they named with a number to make it more intuitive:
    • 0 – Global.biml
    • 1 – ProjectConnections.biml
    • 2 – Table.Stage.ADWorks.biml
    • 3 – Package.OLEDBBulk.Single.biml

step7

If you’ve completed all the steps to here you’ve just automatically created the stage load of 88 tables and nearly a 1000 columns. The package will have events embedded in to capture row stats, operational logging and data lineage.

If you’d prefer to have 1 package per table load repeat this step but use the 3 – Package.OLEDBBulk.Multiple.biml instead of the 3 – Package.OLEDBBulk.Single.biml. This will create 88 packages 1 for each table.

step7a

You can then repeat the step again with the 3 – Package.OLEDBBulk.Multiple.Master.biml and this will create a master package that executes all of the child load packages.

step7b

As the packages are created the BIML script calls the [semanticinsight].[]map_data_attributes] procedure using Semantic.Weave that maps the meta data together in the meta data repository exactly as the data is provisioned by the load packages. This mapping id is then tied into the package execution logging giving a fully integrated meta data repo, logging and data lineage.

select * from [semanticinsight].[data_object_mapping]
select * from [semanticinsight].[data_attribute_mapping]

step7c

Step 8 – Load Stage

 

  • Execute either the Component_OLEDBBulk.dtsx or the Master Adventure Works Stage.dtsx if you’re using the multple package approach.

step8

  • Review the logging information e.g. please note the id’s will obviously be different in your implementation change them as required. Note that the execution_id in the semanticinsight.execution_id is the SSIS server execution id. This is captured so SSIS extension logging can be tied to the native SSISDB logging when required. It will default to 0 when run locally and not on the SSIS server.
use SSISDB
GO

select * from Stage.Production.Product

select * from [semanticinsight].[process] where process_id = 1
select * from semanticinsight.data_object_mapping where data_object_mapping_id = 67
select * from semanticinsight.process_data_object_stats s where s.process_id = 1 and s.data_object_mapping_id = 67

step8a

Other Docs

 

 

The Basics – BIML: Project Connection GUIDs

Was in the process of just updating my framework to get it released and documented and came across a slight issue with  my code and Project Connection GUID’s. Am not sure if something has changed since I last used it but I’m sure I didn’t have this issue before but nevertheless.

The Problem

 

Right… if you’ve been doing SSIS a while then you’ll know connections are hooked up to tasks using a unique identifier that is a GUID. It doesn’t use the connection name which I’ll be honest for practical purposes is pretty annoying sometimes. If you re-create the connection you have to edit all the package tasks that use the connection because it has a different GUID. Well with BIML that ain’t so bad because you can just re-create all the packages quite easily

However… I do get the issue where I have a BIML script that has a code nugget that creates many packages. However on each iteration for each package it creates new connections whether I choose to overwrite them or not this means only one of my packages (the last one) has the connections hooked up properly. Also I may create assets in a few different hits that use the same connections and it’s just to easy to break the GUID’s by re-creating the connections.

The Solution

 

The solution I’ve settled on is to force the connection GUID’s. The reason being GUID’s constantly being dynamic and out of sync is annoying and it suits me better to have explicit control of what they are. At the <BIML> root the <connection> node doesn’t allow the definition of a GUID or Id. To get around this I’ve used an annotation and then used the annotation in a code nugget in the <package> node to force the GUID’s

This suits my purpose for my framework because I don’t want to repeat code. I can basically have 1 single BIML file that is my global connections definition for all my other BIML scripts that depend on those connections.

So for a very simple example we 2 files:

  1. ProjectConnections.biml – defines my connections
  2. Package.biml – defines my package that uses the connection

ProjectConnections.biml

 

<Biml xmlns="http://schemas.varigence.com/biml.xsd">

<Connections>
 
 <Connection Name="Adventure Works" 
 ConnectionString="Data Source=.;Initial Catalog=AdventureWorks2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
 CreateInProject="true">
 <Annotations>
 <Annotation AnnotationType="Tag" Tag="GUID">0284217D-A653-4400-87F3-529A569B8F05</Annotation> 
 </Annotations>
 </Connection>
 
 <Connection Name="Stage" 
 ConnectionString="Data Source=.;Initial Catalog=Stage;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
 CreateInProject="true">
 <Annotations>
 <Annotation AnnotationType="Tag" Tag="GUID">46A16E16-E57A-4D98-82CA-13AB3F011980</Annotation> 
 </Annotations>
 </Connection>
 
 <Connection Name="SSISDB" 
 ConnectionString="Data Source=.;Initial Catalog=SSISDB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
 CreateInProject="true">
 <Annotations>
 <Annotation AnnotationType="Tag" Tag="GUID">CF98B9CE-7D19-4493-9E33-1026C688F874</Annotation> 
 </Annotations>
 </Connection>
 
 </Connections>
</Biml>

Package.biml

 

<#@ template language="C#" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> 
    <Packages> 
        <Package Name="Package Connections 1">
        
            <Tasks>
    <ExecuteSQL Name="Test" ConnectionName="SSISDB" ResultSet="None">
      <DirectInput>
                  SELECT TOP 0 1
      </DirectInput>
    </ExecuteSQL>
            </Tasks>
              
            <Connections> 
            <# foreach(var connection in RootNode.Connections) {#> 
            <Connection ConnectionName="<#=connection.Name#>" Id="<#=connection.GetTag("GUID")#>">
                
            </Connection> <#}#> 
            </Connections> 
        </Package>

        <Package Name="Package Connections 2"> 
        
            <Tasks>
    <ExecuteSQL Name="Test" ConnectionName="SSISDB" ResultSet="None">
      <DirectInput>
                  SELECT TOP 0 1
      </DirectInput>
    </ExecuteSQL>
            </Tasks>
              
            <Connections> 
            <# foreach(var connection in RootNode.Connections) {#> 
            <Connection ConnectionName="<#=connection.Name#>" Id="<#=connection.GetTag("GUID")#>">
                
            </Connection> <#}#> 
            </Connections> 
        </Package> 
        
    </Packages>
    
</Biml>

With BIML express we can select ProjectConnections.biml then Package.biml, right click, hit Generate SSIS Packages and everything is all good. The cool thing is that if you’ve already created the connections as part of the framework build then it doesn’t matter if you overwrite or not when you get message below because the GUID’s are explicitly defined.

connections overwrite

Wrap Up

 

This is a very basic example of the pattern I’m using in my framework – which I am trying to get out but just have to update and polish a few things. Fundamentally I’m forcing the project connection GUID’s using annotations. I did look for an Id property on the BIML connection object so I don’t have to use annotations but couldn’t find one. I may have overlooked it during my haste and if there is one I’m sure someone will correct me.

There’s a bit more to it when you start handling table definitions and I’ve used linq to sub-select the connections I need for a particular package templates. These more sophisticated examples will be in my framework code.

I do have a metadata repo and you could declare the connections in the repo and control the creation from there using custom C# assembly. However I really don’t like re-producing features that SSIS already has. It already has a repo for connections and environments called SSISDB and my framework extends SSISDB rather than overlapping it which I much prefer since there is no confusion or complexity in the end product of how environments are configured and administrated. I don’t see the point or creating features that SSIS already has and that can used within SSMS.

Introduction to Automating SSIS with BIML

2 videos from our channel giving a practical introduction to automating SSIS with BIML and C#. Part 1 shows the basics of BIML scripting. Part 2 shows using C# and BIML to dynamically create SSIS assets using patterns and meta data.

Part 1: Introduction to BIML Scripting

Part 2: Introduction to Dynamic BIML Scripting with C#