Just a bit of fundamentals. Effectively data ETL, ELT and data provisioning platforms are predominantly schema typed and this isn’t going away whilst we’re on silicon chips. The primary reason for this is performance! Even in MPP architectures you’re going to hit a strongly typed schema at some point; you can’t escape the entropy of information.
SSIS has to size a memory pipeline for it’s given data feed. It does this so it can shovel the data through fast distributed or not; and just to mention the next version of SSIS can be distributed on multiple servers (perhaps we’ll see in Azure 1 day). SSIS provides a hugely flexible and re-usable GUI to pretty much build any workflow and data feed your abilities will allow; so what’s the rub?
Each SSIS executable you create has specific use and definition of metadata, with re-usable but very low level features that are click-tastic enough to give you RSI after one large project. It’s labour intensive to the max and subsequently error prone.
Designers cannot separate the metadata definitions from re-usable implementation patterns
This is a real pain… Why can’t I just build a template SSIS package dimension load and fire in my metadata definition for all my dimensions and implement them all with a single click of a mouse getting all the re-use of fully tested code that any sane software engineer would expect? Well you can but not with VS alone; enter BIML
In short BIML (Business Intelligence Markup Language) returns SSIS back to a respectable coding experience where I can define my template SSIS patterns using readable XML, hook up a central metadata repository using C# and generate all my DW feeds. Now we’re talking… proper code re-use delivering specific and fast SSIS executables.
So what’s this about T4 Templates then?
Well BIML is only free to a point (BIML Express). It does have a fully blown IDE (Mist) if you have the bucks. Also a lot of good SSIS patterns for some of the more upstream transformations require relatively complicated stored procedures that can’t be created using BIML templates. You can of course write your own metadata SQL wrapper using C# which is a fair amount of leg work to say the least – it’s a long road I have been on it.
Another option is dynamic SQL – which in my humble opinion are 2 words that should never go together! Why? Because SQL is terrible for text parsing / code generation and all the other reasons that are just too upsetting to list.
Or… you can use T4 Templates!
T4 Templates have been in Visual Studio since 2005. They’re basically a similar concept to BIML except the language can just be any text – not just BIML. I can basically template out any other language using C# to dynamically push in metadata in very much the same way that BIML works. I’m not just limited to a specific language either it could be used to create SQL procedures, Azure Data Factory json pipelines, Azure Data Lake jobs or C#.
It was included in Visual Studio specifically for templating Code Generation for design time or run time. It’s used by visual studio itself to generate code from UI designers such as Entity Framework for example. T4 comes from the abbreviation of:
- Text – dynamically creates text output using template files
- Template – a combination of text blocks & control logic written in C# or VB.net
- Transformation – transforms the text in executable code and executes to produce the final output
- Toolkit – a set of assemblies packaged into Visual Studio
There are 2 types of Templates that can be used:
- Design Time – Templates are executed at design to generate code
- Run Time – Executed when the application executes are compiled into classes in the application. They can receive parameters and sit within control logic.
Using T4 Instead of BIML
It’s a non-starter, forget it! Whilst you could script out SSIS packages using the native XML it’s just not (that) human readable. BIML brings the mark-up language to party which is more human readable by a country mile. You’d be mad to try to script out native SSIS XML whilst BIML express is available for free.
Design or Run Time
On the topic of code generating code. Should we be generating and executing code during design time or run time? I’ve seen various flavours of frameworks that do both or somewhere in between.
I’m a firm believer that code generation should only occur during design time or more specifically during a development process. The development process has all the bells and whistles required to help manage risk during a code generation exercise e.g. accidentally loading live data to an unsecured area of access or loading DEV data into live not to mention all the other bugs and failures that could occur.
Do we really want dynamic code executing in live that has never run before? Also debugging and issue resolution is an enormous pain in the neck if the code that ran isn’t available, easy to isolate and debug – dynamic run time frameworks written in anger tend to be low on support features and over complicated!
Also the arguments for dynamic run time seem to be around circumventing bureaucratic change control that was put in place because of a poor development process. The solution to a robust agile BI development isn’t slipping in a cheeky back door to make clandestine changes it is in fact continuous integration which is a whole other detailed topic.
To use T4 templates with BIML frameworks we use the run time execution type but during the development process since we can call the classes using C# that BIML executes during package creation. So in that respect they execute at the run time of BIML execution not the run time of the warehouse load.
I’ve still yet to come across a team fully automating the delivery of a data warehouse using meta-data. The tools are available (and free) so I’m not sure what the barriers are.