ADP Framework: Getting Started

This is the getting started guide for the Automated Data Provisioning (ADP) Framework using BIML, SSIS and SQL; Scroll to the bottom to see a growing list of detailed docs.

Much of the what is referred to like database names is configurable. However the code currently in the GitHub repo will work immediately without any extra tweaking. I recommend following this to the letter before attempting to reconfigure the framework just to build familiarity.

Recommended Tools

Visual Studio 2017 SSDT now has support for SSIS. However we still currently need VS installs for both 2015 and 2017 because BIML Express doesn’t have an installer yet for Visual Studio 2017. I recommend installing Visual Studio 2017 for the full blown development experience and Visual Studio 2015 SQL Server Data Tool (SSDT) for developing SSIS. Since it keeps the disk footprint as light as possible.

You can also install Visual Studio 2017 SSDT if you wish but you won’t be able to use BIML Express with it until Varigence release an installer.

Step 1 – Create Databases

 

Step 1

Step 2 – Build Semantic Weave

 

  • Open the semantic.weave solution in Visual Studio 2017
  • Review the connection string in SemanticInsight.Weave.dll.Config and ensure it points at your SSISDB
  • Rebuild the solution
  • Publish the SSISDB project to the SSISDB database; ensure the database deploys successfully and that the SemanticInsight schema tables and procs exist in the database – see images below
  • Navigate to the bin folder for the Semantic.Weave project and get the path since we’re going to need to reference it in our BIML scripts later e.g
C:\github\SemanticInsight.Weave\SemanticInsight.Weave\bin\Debug\SemanticInsight.Weave.dll

step 2

Step 2 a

Step 3 – Reconfigure

NOTE: The global file generally works with biml script without having to select it. Currently however, it has to be selected because we’re using to house our assembly reference in a single place.

  • Open the BIML solution in Visual Studio 2015
  • The “1 – ProjectConnections.biml” in both projects BIML ETL and BIML Utility contains the environment connections that will be used to create the DB connections. Open this file and review the connection strings to ensure they are correct for each database:
    • SSISDB
    • Stage
    • AdventureWorks
  • You’ll notice the connections also have annotation for GUID’s. This is so we can control and ensure the GUIDs are consistent when connections are created. If you want to generate new GUID’s this can be by simply running the following in SQL Server Management Studio
SELECT newid()

step 3

  • Now open the following “0 – Global.biml” and ensure the Assembly Name reference points to where the Semantic.Weave.dll is on your machine. See Step 2. Ensure to escape replace the backslashes in the path using double backslashes.

step3a

Step 4 – Scrape Adventure Works Meta Data

 

  • Press ctrl and select the following 2 files in the BIML Utility project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they are named with a number to make it more intuitive:
    • 1 – ProjectConnections.biml
    • 2 – MetaDataScrape.SQLServer.ADWorks.biml
  • Execute the package called Sssidb_SqlServerScrape_AdventureWorksBI_AdventureWorks.dtsx that is created

step4a

The BIML script created a package and 2 project connections. This package does 2 things:

1. It executes the the stored procedure [semanticinsight].[configure_system_component]. This procedure populates the table [semanticinsight].[system_component] that defines the solution component architecture in a parent child structure. e.g. Source Databases, Stage Databases, etc. This procedure is an inflection point to building your own solution. Use this procedure to place logic to populate the component structure for your own solution.

select * from [semanticinsight].[system_component]

step4a

[semanticinsight].[configure_system_component] also populates the system component schema’s and maps them together for data loading. How you configure these tables will subsequently affect how meta data is mapped, what database objects are created and what load packages are created.

select * from [semanticinsight].[data_schema]
select * from [semanticinsight].[data_schema_mapping]

step4b

2. The package queries the AdventureWorks for schema information and loads that data into the SSISDB meta data repository tables. You can check this by looking at the data in the tables:

select * from [semanticinsight].[data_object]
select * from [semanticinsight].[data_attribute]

step4c

Step 5 – Build the Stage Database

 

  • Press ctrl and select the following 3 files in the BIML Utility project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they are named with a number to make it more intuitive:
    • 0 – Global.biml
    • 1 – ProjectConnections.biml
    • 2 – StageTableDefinitions.ADWorks.biml
    • 3 – CreateTable.biml
  • Execute the package called CreateTables_Stage.dtsx that is created
  • Check the stage data base for newly created tables and schemas

 

step5b

The StageTableDefinitions.ADWorks.biml script uses the meta data and component schema mappings created in step 4 to provide target table definitions for the stage database. The

Step 6 – Scrape Stage Meta Data

  • Press ctrl and select the following 2 files in the BIML Utility project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they are named with a number to make it more intuitive:
    • 1 – ProjectConnections.biml
    • 2 – MetaDataScrape.SQLServer.ADWorksStage.biml
  • Execute the package called Sssidb_SqlServerScrape_AdventureWorksBI_AdventureWorksStage.dtsx that is created

This package works in much the same way as the explanation in step however this time we’ve scraped the schema information from the Stage database tables that we created in the Step 5.

step6

Step 7 – Create Packages & Map Meta Data

 

  • Press ctrl and select the following 3 files in the BIML ETL project in the order listed, right click and select “Generate SSIS Packages”. The order is important and they named with a number to make it more intuitive:
    • 0 – Global.biml
    • 1 – ProjectConnections.biml
    • 2 – Table.Stage.ADWorks.biml
    • 3 – Package.OLEDBBulk.Single.biml

step7

If you’ve completed all the steps to here you’ve just automatically created the stage load of 88 tables and nearly a 1000 columns. The package will have events embedded in to capture row stats, operational logging and data lineage.

If you’d prefer to have 1 package per table load repeat this step but use the 3 – Package.OLEDBBulk.Multiple.biml instead of the 3 – Package.OLEDBBulk.Single.biml. This will create 88 packages 1 for each table.

step7a

You can then repeat the step again with the 3 – Package.OLEDBBulk.Multiple.Master.biml and this will create a master package that executes all of the child load packages.

step7b

As the packages are created the BIML script calls the [semanticinsight].[]map_data_attributes] procedure using Semantic.Weave that maps the meta data together in the meta data repository exactly as the data is provisioned by the load packages. This mapping id is then tied into the package execution logging giving a fully integrated meta data repo, logging and data lineage.

select * from [semanticinsight].[data_object_mapping]
select * from [semanticinsight].[data_attribute_mapping]

step7c

Step 8 – Load Stage

 

  • Execute either the Component_OLEDBBulk.dtsx or the Master Adventure Works Stage.dtsx if you’re using the multple package approach.

step8

  • Review the logging information e.g. please note the id’s will obviously be different in your implementation change them as required. Note that the execution_id in the semanticinsight.execution_id is the SSIS server execution id. This is captured so SSIS extension logging can be tied to the native SSISDB logging when required. It will default to 0 when run locally and not on the SSIS server.
use SSISDB
GO

select * from Stage.Production.Product

select * from [semanticinsight].[process] where process_id = 1
select * from semanticinsight.data_object_mapping where data_object_mapping_id = 67
select * from semanticinsight.process_data_object_stats s where s.process_id = 1 and s.data_object_mapping_id = 67

step8a

Other Docs

 

 

Advertisements

The Basics – BIML: Project Connection GUIDs

Was in the process of just updating my framework to get it released and documented and came across a slight issue with  my code and Project Connection GUID’s. Am not sure if something has changed since I last used it but I’m sure I didn’t have this issue before but nevertheless.

The Problem

 

Right… if you’ve been doing SSIS a while then you’ll know connections are hooked up to tasks using a unique identifier that is a GUID. It doesn’t use the connection name which I’ll be honest for practical purposes is pretty annoying sometimes. If you re-create the connection you have to edit all the package tasks that use the connection because it has a different GUID. Well with BIML that ain’t so bad because you can just re-create all the packages quite easily

However… I do get the issue where I have a BIML script that has a code nugget that creates many packages. However on each iteration for each package it creates new connections whether I choose to overwrite them or not this means only one of my packages (the last one) has the connections hooked up properly. Also I may create assets in a few different hits that use the same connections and it’s just to easy to break the GUID’s by re-creating the connections.

The Solution

 

The solution I’ve settled on is to force the connection GUID’s. The reason being GUID’s constantly being dynamic and out of sync is annoying and it suits me better to have explicit control of what they are. At the <BIML> root the <connection> node doesn’t allow the definition of a GUID or Id. To get around this I’ve used an annotation and then used the annotation in a code nugget in the <package> node to force the GUID’s

This suits my purpose for my framework because I don’t want to repeat code. I can basically have 1 single BIML file that is my global connections definition for all my other BIML scripts that depend on those connections.

So for a very simple example we 2 files:

  1. ProjectConnections.biml – defines my connections
  2. Package.biml – defines my package that uses the connection

ProjectConnections.biml

 

<Biml xmlns="http://schemas.varigence.com/biml.xsd">

<Connections>
 
 <Connection Name="Adventure Works" 
 ConnectionString="Data Source=.;Initial Catalog=AdventureWorks2012;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
 CreateInProject="true">
 <Annotations>
 <Annotation AnnotationType="Tag" Tag="GUID">0284217D-A653-4400-87F3-529A569B8F05</Annotation> 
 </Annotations>
 </Connection>
 
 <Connection Name="Stage" 
 ConnectionString="Data Source=.;Initial Catalog=Stage;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
 CreateInProject="true">
 <Annotations>
 <Annotation AnnotationType="Tag" Tag="GUID">46A16E16-E57A-4D98-82CA-13AB3F011980</Annotation> 
 </Annotations>
 </Connection>
 
 <Connection Name="SSISDB" 
 ConnectionString="Data Source=.;Initial Catalog=SSISDB;Provider=SQLNCLI11.1;Integrated Security=SSPI;Auto Translate=False;"
 CreateInProject="true">
 <Annotations>
 <Annotation AnnotationType="Tag" Tag="GUID">CF98B9CE-7D19-4493-9E33-1026C688F874</Annotation> 
 </Annotations>
 </Connection>
 
 </Connections>
</Biml>

Package.biml

 

<#@ template language="C#" #>
<Biml xmlns="http://schemas.varigence.com/biml.xsd"> 
    <Packages> 
        <Package Name="Package Connections 1">
        
            <Tasks>
    <ExecuteSQL Name="Test" ConnectionName="SSISDB" ResultSet="None">
      <DirectInput>
                  SELECT TOP 0 1
      </DirectInput>
    </ExecuteSQL>
            </Tasks>
              
            <Connections> 
            <# foreach(var connection in RootNode.Connections) {#> 
            <Connection ConnectionName="<#=connection.Name#>" Id="<#=connection.GetTag("GUID")#>">
                
            </Connection> <#}#> 
            </Connections> 
        </Package>

        <Package Name="Package Connections 2"> 
        
            <Tasks>
    <ExecuteSQL Name="Test" ConnectionName="SSISDB" ResultSet="None">
      <DirectInput>
                  SELECT TOP 0 1
      </DirectInput>
    </ExecuteSQL>
            </Tasks>
              
            <Connections> 
            <# foreach(var connection in RootNode.Connections) {#> 
            <Connection ConnectionName="<#=connection.Name#>" Id="<#=connection.GetTag("GUID")#>">
                
            </Connection> <#}#> 
            </Connections> 
        </Package> 
        
    </Packages>
    
</Biml>

With BIML express we can select ProjectConnections.biml then Package.biml, right click, hit Generate SSIS Packages and everything is all good. The cool thing is that if you’ve already created the connections as part of the framework build then it doesn’t matter if you overwrite or not when you get message below because the GUID’s are explicitly defined.

connections overwrite

Wrap Up

 

This is a very basic example of the pattern I’m using in my framework – which I am trying to get out but just have to update and polish a few things. Fundamentally I’m forcing the project connection GUID’s using annotations. I did look for an Id property on the BIML connection object so I don’t have to use annotations but couldn’t find one. I may have overlooked it during my haste and if there is one I’m sure someone will correct me.

There’s a bit more to it when you start handling table definitions and I’ve used linq to sub-select the connections I need for a particular package templates. These more sophisticated examples will be in my framework code.

I do have a metadata repo and you could declare the connections in the repo and control the creation from there using custom C# assembly. However I really don’t like re-producing features that SSIS already has. It already has a repo for connections and environments called SSISDB and my framework extends SSISDB rather than overlapping it which I much prefer since there is no confusion or complexity in the end product of how environments are configured and administrated. I don’t see the point or creating features that SSIS already has and that can used within SSMS.

Finer Details – Stream Analytics & PowerBI: Feb 1st Update

So folks might have noticed the Feb 1st update to stream analytics. There’s a fair bit of stuff in this release.. What I am going to focus on though is how stream analytics integrates into Power BI now.

Power BI has had the addition of Stream Data sets sometime after Azure Stream Analytics (ASA) integration. Folks who have worked with it might be aware that when you hook up ASA to Power BI it just creates a dataset and doesn’t create a stream dataset. When I first did this it jumped out as something that probably needed to be consolidated… and now it has.

capture

So what does the implementation experience look like now… Well from the stream analytics nothing much appears to have changed. I have a test query that’s not doing much it just fires through some data. I’m actually firing json in through the event hub but will focus on the ASA PowerBI bit. The query is as follows:

WITH customers AS (
 SELECT
 [customer].customerCode,
 [customer].firstname,
 [customer].lastname,
 [customer].addressline1,
 [customer].addressline2,
 [customer].addressline3,
 [customer].postcode,
 [customer].town,
 [customer].county,
 [customer].country,
 [customer].customerEmail,
 [transaction].transactionCode,
 [transaction].status,
 [transaction].[transactionTime],
 [transaction].[transactionAmount]
 FROM
 [evehub]
)

SELECT * INTO [TestReport] FROM customers

When creating the output everything seems as it was before. My assumption was that something would’ve changed here and that I would have to create my streaming dataset end point definition first in Power BI and choose my data set here. But that’s not the case…

capture2

The subtle impact here is that it’s not like an API where I can just fire json at the secure end point. I still have to authorize with an account! This means you need to have a PowerBI account for your stream analytics service which in a production environment you probably won’t want that to be a named user account. Also, if the workspace needs to be shared then it will need to be a Pro account.

So far we’ve hooked up ASA to Power BI but we haven’t done anything with Power BI. Well it turns out we don’t have to manually define the end point like we do with API integration. When data starts streaming through the data set appears automatically in our Streaming Datasets within Power BI. So when I turn on my data stream using a C# app I get the following in PowerBI.

capture3

When I click the pencil to edit the data set I can see that it’s created all the attributes, typed them and defaulted history to on.

capture4

Now it seems we have some control of how these data sets are shaped in Power BI and we can flip the history on or off. We also have an end point that we can use with an api, cURL or PowerShell…

capture5

This means we might be able to stream in data to the data set using another api source not just stream analytics. Also it might seem I can do the following:

  • add an attribute that’s not in my ASA stream
  • remove an attribute that is in my ASA stream

Lets see…

Add an Attribute

I created a streaming tile just totting up transactions realtime.

capture6

When I add a new column the streaming tile still displays without issue. However when I go to create a new report or edit the existing report I used to pin to the dashboard from my dataset I start hitting issues. To see the new column in the dataset on the report editor I have to hit refresh. After doing this my data set and report is broken.

capture7

capture8

Editing the data set again, removing the column and hitting refresh doesn’t fix the issue. It seems I have to trash and recreate the data set.

How about ASA? Well ASA also had a problem. It didn’t stop the job but failed to send the transactions meaning the seemingly unaffected tile won’t update with any new data. Note the lack of column name in both the Power BI and ASA exception.

capture10

 

Remove an Attribute

What happens when we remove an attribute?

Well… the same thing as we might expect from trying to add a column.

Conclusion

This is a good change and consolidates the features of streaming data sets into Power BI for API’s and Stream Analytics. We gain some further control over datasets but it seems ASA is very tightly bound to the data set definition and in reality we can’t do much without breaking it.

It might mean that we can stream data in from another source not just ASA since we now have an endpoint we can use… I haven’t tried this yet.

All in all I’m still left a bit wanting in the mashing features for realtime data in Power BI. I’m sure it will get there… Microsoft are doing a great job… We are a tough bunch to please these days 🙂

 

 

Introduction to Automating SSIS with BIML

2 videos from our channel giving a practical introduction to automating SSIS with BIML and C#. Part 1 shows the basics of BIML scripting. Part 2 shows using C# and BIML to dynamically create SSIS assets using patterns and meta data.

Part 1: Introduction to BIML Scripting

Part 2: Introduction to Dynamic BIML Scripting with C#

Multi-Valued Attributes in SQL Server Analysis Services (SSAS)

A multi-valued attribute in data warehousing terms is a dimension attribute that has more than 1 value for a particular dimension member. Rather than focus too much on the warehouse technique in question Vincent Rainardi’s blog post provides a very good technically agnostic explanation far better than I could.

When we come to model this in SSAS we can consider each of solutions presented by Vincent:

  1. Lower the grain of the dimension
  2. Put the attribute in another dimension, linked directly to the fact table
  3. Use a fact table (bridge table) to link the 2 dimensions
  4. Have several columns in the dim for that attribute
  5. Put the attribute in a snow-flaked sub dimension
  6. Keep in one column using columns or pipes

Option 6 isn’t very practical or elegant for SSAS since it implies that you’d have front end that has the intelligence to separate out the delimited values and deal with the facts appropriately. This is not very practical considering how tools like Excel and Reporting Services access the Unified Dimensional Model (UDM).

Option 4 again isn’t very SSAS friendly regarding presentation and can get out of hand very quickly. In scenarios where multivalued attributes occur it can be very prevalent across the business, often where products are classified many different ways, for example Movies. Creating columns for every value for every multi-valued attribute can become a bit of a mare, also it’s not a very dynamic solution.

Option 2, placing the attribute in another dimension whilst it will work isn’t very elegant from a user perspective because of the way SSAS exposes dimensions and attributes. Conceptually from a user navigation and analysis perspective they are attributes of the same dimension and thus should be organised as such without having to query across the fact table to get the results you want. Once you’ve separated the attributes into separate dimensions you cannot build hierarchies that provide intuitive guidance to how users should explore and report the data. Also, it gets even trickier since you have to lower the grain of the fact table to avoid double counting which means allocation! Allocation can be a tricky, sluggish and ETL labouring solution. I do acknowledge however that pushing back logic to the ETL layer can do wonders for reporting query response times.

To get the desired output in SSAS I’m going to make use of a mixture of options 1, 2 and 3 with some slight but important differences to how Vincent has stated them. I will lower the grain of the dimension logically and not physically. I’ll put the attributes in a second dimension but link indirectly to fact table through the bridge table presented in option 3. I’ll also use some SSAS configuration in order to present all the attributes in what appears to be a single physical dimension linked directly to the fact table for the user.

The Solution

So our aim:

  • Accommodate multi-valued attributes in a single dimension exposed to the user
  • Avoid complex allocation procedures to lower the grain of the fact table
  • Accommodate many-to-many relationships between multi-valued attributes without double counting the fact

For our example we’ll consider a great passion of mine Movies! Consider a basic additive measure with 2 dimensions:

Measure              Viewings

Dimension          Movie

Dimension          Date

Conceptually the Movie dimension has the following attributes associated with it:

  • Name                    (Single Value)
  • Genre                   (Multi-Value)
  • Theme                  (Multi-Value)
  • Language             (Multi-Value)

So if we consider 1 viewing of “The American” directed by Aton Corbijn and 1 viewing of “Once Upon a Time in the West” directed by Sergio Leone, we want the data to appear as follows:

Date MQY Movie Measures
Day Name Genre Viewings
03-May-2011 The American Crime

1

Drama

1

Once Upon a Time in the West Crime

1

Western

1

Total

2

Table 1: Movie Report by Name

However we break it out by the attributes the total is always 2 since there has only been 2 viewings i.e. we’re not double counting. If we were to report the Movie dimension by Genre alone then the results would be as follows since 2 viewings count towards Crime and 1 viewing counts towards Drama and Western, however the total viewings is still only 2.

Date MQY Movie Measures
Day Genre Viewings
03-May-2011 Crime

2

Drama

1

Western

1

Total

2

Table 2: Movie Report by Genre
Figure 1: Meta Data

Figure 1: Metadata

The meta data in Figure 1 shows a basic Date dimension and the Viewing measure. It also shows the Movie dimension containing the Movie (name) and the multi-valued attributes Genre, Language and Theme all nicely organised into the movie dimension. If desired we could provide more structure to the Movie dimension by creating hierarchies since they all belong to the same dimension.

The Pattern

Being familiar with how SSAS models I tend to think of this first and foremost as a fact table grain problem rather than dimensional problem. Essentially we need to lower the grain of the fact table to avoid double counting across the multi-valued attributes. If we’re not going to allocate down the measures we can do this using bridge table in SSAS and then use a bit of design trickery to hide the intermediate dimension and expose all the attributes in the outer dimension for the users to browse.

Relation Model

Figure 2 shows the structure of the source relational database:

Relational Database Model
Figure 2: Relational Database Model

The dimensions are highlighted blue and the fact / bridge tables are grey. Over to the right hand side we see the basic star schema which is the FactViewings with the 2 dimensions DimMovieName and DimDate containing single value attributes. Over to the left hand side we see that DimMovieName is joined to a DimMovieMVAJunk through a bridge table called BridgeMovieMVAJunk.

The DimMoveMVAJunk contains a Cartesian product of all the possible multi-valued attributes. We chose to use a junk dimension because in our particular dimension the data volumes aren’t particularly that scary and we don’t have to concern ourselves too much about the actual relationships between the multi-valued attributes that exist or could exist in our data. Any relationships that exist today may be different tomorrow so we’ll just treat all the multi-valued attributes as many to many. If we know this then we can pre-populate the junk dimension with all the combinations we could possible encounter and give it a surrogate key.

The BridgeMovieMVAJunk maps the multi-value attributes to single value attribute in DimMovieName. We use a bridge table since we can assign more than one value of an attribute to 1 particular Movie. We cannot combine MovieName and the other attributes together and bind them to the FactViewings table because the grain key of the fact table is MovieKey. If we combine all the attributes the MovieKey of the DimMovieName will no longer be the grain and SSAS will not report the correct values against the attributes and totals. The bridge table we just populate with the actual relationships that exist between DimMovieName and all the other attributes. The bridge table is effectively a fact less fact table and we can use the same fast loading techniques on a bridge table that we can use on a regular fact table i.e. it’s easy it manage and load.

Also in the database we’ll use the following logic to create the view that will feed the Movie dimension in the cube that the users will use and see.

CREATE VIEW [DimMovie]
AS
SELECT
m.[MovieKey],
j.[MovieMVAJunkKey],
m.[MovieName],
j.[Genre],
j.[Language],
j.[Theme]
FROM DimMovieName m
INNER JOIN BridgeMovieMVAJunk b ON m.MovieKey = b. MovieKey
INNER JOIN DimMovieMVAJunk j ON j.MovieMVAJunkKey = b.MovieMVAJunkKey

Please note the grain of this dimension is now a component key of the single value attributes and the multi value attributes. Also ensure the database is adequately tuned for the execution of this statement otherwise (depending on data volumes) you might find it takes a while for your Movie dimension to process. You could also do this in the Data Source View (DSV) of the UDM but I’m not a fan of placing logic in the DSV unless there is real cause to do so.

UDM

Figure 3 shows the structure of the SSAS DSV.

SSAS DSV
Figure 3: SSAS Data Source View

The SSAS DSV is almost identical to the relation model except we’ve used the view DimMovie that combines all the movie attributes together and joined it to the bridge using the component key i.e. we’ve tagged the single value attributes onto the multi-valued attributes to get our complete dimension.

Figure 4 shows the cube structure in SSAS where we’ve created 2 measure groups and 3 dimensions.


Figure 4: Cube Physical Structure

We have a fact table containing the core measures, which in our case [FactViewings] has been added as a measure group called [Viewings]. We have bridge table called [BridgeMovieMVAJunk] that is added in as a measure group called [BridgeMovieAttributes] but either hide the default count measure or delete it completely so that the users are unaware that this measure group exists.

The [System Movie] dimension is created from the [DimMovieName] table and as the name implies is for utility purposes and will be hidden using the visible property. This dimension simply contains the grain key of the single-value attributes and is used as an intermediate dimension to bind to [FactViewings]. The [Movie] dimension is created from the [DimMovie] view dimension and contains all the movie attributes and binds to the [BridgeMovieMVAJunk] using a component key.

Figure 5 shows the dimensionality.


Figure 5: Dimension Usage

The final setup for the cube is to ensure the dimensionality is configured correctly. Here we see [System Movie] sits across both measure groups using a regular relationship, remember that [System Movie] and [Bridge Movie Attributes] are both hidden. The [Movie] dimension is bound to [Bridge Movie Attributes] using a regular relationship, when creating this relationship you must ensure the relationship is bound using the full component key i.e. [MovieKey] and [MovieMVAJunkKey]. The [Movie] dimension is then bound to the core measure group [Viewings] using a many to many relationship.When processed you should end up with the meta-data presented in figure1. All fairly straightforward!

The Results

Ok, so now we have a structure that we think will do the job. All is left to do is stick in some data and see if it works. I’ve set up the data in my table with the following movie classifications:

Table 3: Movie Classifications

I’ve also set up the data with the following viewings:

Table 4: Movie Viewings

Having loaded the data and processed the cube I’ve pulled out the data in Excel which can be seen in Figure 6 and figure 7.

Figure 6: Reporting Against a Single Attribute

In figure 6 we can see the meta-data nicely organised with the single and multi-value attributes all available together under the single [Movie] dimension. In Movie Viewings report notice that summing the figures correctly matches the total value of 21. Below that report is the Genre Viewings report, notice here that if we manually sum up the viewing figures we’ll get 49 which is not correct since we’ll be double counting viewings where a movie exists in more than one Genre. The cube grand total correctly shows 21 and does not double count the viewings because of the logic we have created to handle multi-valued attributes

Figure 7: Reporting Against Multiple Attributes

In figure 7 we can see how single and multi-value attributes work side by side in further detail.  Notice that Genre totals are a sum of the viewings for Movies within the Genre. Also notice that Movies exist in more than one Genre and yet the Grand Total is still only 21 and that we are not double counting e.g. the viewings for ‘Very Bad Things’ count towards Thriller, Crime and Comedy.

I’ve included other multi-value attributes in the design pattern just to show how it can be done. I’ve only played with one multi-valued attribute here in so that I don’t end writing the world’s longest blog post! So by all means knock up cube and have a play and you should see that all the attributes work nicely together.

Conclusion

We’ve successfully modelled multi-valued attributes into a single dimension within a cube whilst handling the issue of aggregating correctly by:

  1. Lowering the grain of the dimension to store all the multi-value attributes together in a junk dimension
  2. Creating a bridge table to map the multi-value attributes to the single value attributes
  3. Linking all the attributes in a single dimension to the fact table through the bridge table and single-value attribute dimension

We used the many to many dimension functionality of SSAS along with the customisation of some properties to bind the single and multi-value attributes onto the core fact whilst hiding the complexity involved from the user.

In terms of scalability and performance it’s going to really depend on your situation. We’ve pushed the logic into the cube structure and processing so it’s definitely going to be a lot better than using lots of complex MDX scoping which I’ve seen some solutions try to use. If the bridge table and junk dimension are quite large then you might need to keep an eye on processing performance since our Movie dimension is joining across those two tables plus the single value attribute dimension. The use of many to many SSAS relationships isn’t great for performance though it depends on the volumes, other complexities and any tuning you have performed on your cube. All in all it’s an elegant and simple design pattern that works well making the use of the cube structure without having to do lots of work in the warehouse to reduce the fact table grain.

MS Tabular – Simple Custom Rollup

Introduction

 

I have been having a bit of hands on with SSAS Tabular just lately and one the aspects that I think is extremely powerful is that it provides users with the ability to mash up their own data in various ad-hoc data sources with data from a structured SQL Server database. More specifically allowing users to create a control how granular attributes of a dimension might rollup up into a hierarchy is a valuable thing, for example an organisational structure or an accounts hierarchy. Users do need to be quite good with data modelling and DAX to leverage this and time will tell on that one. However, with that in mind I’m going to create a user controlled custom rollup on an accounts dimension using only very basic data modelling and DAX concepts.

 

Essentially within an accounting solution you might want to avoid storing numbers in the context of positive (+ve) and negative (-ve) since in accounting terms either the business owns it or it doesn’t i.e. it’s either an asset or a liability. Also things come into the business and things go out i.e. they purchase supplies and sell goods and services. None of the accounts themselves are either +ve or –ve since in accounting we deal with either credits or debits. Depending on what aspect of the business you’re evaluating and how you’re presenting the figures an account may considered as in credit for one context but in debit for another. For example the Closing Balance of Stock is considered a Credit in the P&L but a Debit in the Balance Sheet.

 

In BI data modelling terms we deal with this by associating the aggregation operator with the reporting dimension hierarchy rather than the number itself thus when the measure is combined with the hierarchy it is aggregated within the context of the hierarchy. More specifically if you’re familiar with SQL Server Analysis Services Multidim we do this using a rollup operator that is stored on the accounts dimensions.

 

This blog is a simple DAX design pattern to allow business users to create their own custom rollups on an accounts dimension and trial balance fact table from SQL Server.

Solution

 

I always like to present the final solution before going into the specifics of the design since knowing the final output tends to make design details a bit easier to digest. So here’s what we’re aiming for:
Custom Rollup in PowerPivot

 

So here we can see I’ve got a P&L hierarchy that rolls up the accounts through Gross Profit and Expenses and calculates the Profit. The accounts and figures have all been taken from a SQL Server database however which accounts and how they have been rolled up into Cost of Goods Sold, Sales, Gross Profit, Expenses and Net Profit has all been created and controlled by the Excel user. The P&L roll up is created from the following table in the Excel file that is linked into the Tabular model:

customrollup2

 

So in this table we can see the accounts mapping to relevant parent P&L members and a rollup operator that indicates how the account should be evaluated. The Rollup operators should look familiar if you’re from a SSAS Multidim background like me since the symbols are the same. + is for sum, – is subtract and ~ is to exclude. Notice there are many accounts that are not used here since they are not included in the P&L and as such have no parents defined either. A different rollup application might have the numbers excluded from the rollup but still have them included in the hierarchy.

Implementation

 

The underlying SQL Database contains an Accounts dimension, Date dimension and a fact table containing account Balances. The simple star schema is as follows:

customrolloup3

From the diagram we can see that the Account dimension is simply a list of accounts and there is no hierarchy. FactBalance contains the account trial balances which are all physically positive numbers which can conceptually be either summed or subtracted. I’ve included a very simple Date dimension as usual since it’s always good to see how a solution works with more than 1 dimension in play to test and understand what is going on inside the software without making it too complex.

The next step is to pull these 3 tables into SSAS tabular. I won’t go into how this is done in detail since there’s there are many resources that describe this process and I’m assuming if you’re looking to do custom rollup then you’ve already made a start with the basics.

Below is the Accounts dimension and Balance fact table that I’ve pulled into Tabular and renamed for usability.

customrollup4
The next step is to go to the Balance table tab and relate for Account_Key to the Account_Key in the Account table and Date_Key to the Date_Key in the Date table. If your database had these relationships when you imported the tables then Tabular will have created them automatically.

So, now we have our Tabular model we’re going to create a table in Excel to rollup the Accounts dimension. Enter the following data into spread sheet in the workbook.

customrollup5

 

With Cell A1 selected click the Link Table icon on the Excel PowerPivot tab.

customrollup7

This will import the data into the PowerPivot model linked directly to the data in the Excel sheet. Rename the new table we’ve just imported to something useful and relate the Account dimension to it using the Account Code and it should look like as follows:

customrollup7

So now we have all the data nicely imported into PowerPivot and related together we’ll apply the custom rollup. To do this I’m going to use the way that SSAS actually applies the operators under the covers (I believe but am sure someone will correct me if I’m wrong). Essentially + operator multiples the fact data by +1, – multiplies the fact data by –1 and ~ multiplies the fact by 0 or null to effectively yield null. After the operator has been multiplied which adjusts the sign of the data or nulls it out we leave it to PowerPivot to simply sum the result to give the answer. At this point you’re probably right in thinking I could’ve just used +1, –1 and 0 instead of +, – and ~ however I just wanted it to be consistent with what we’re (or I’m) used to in SSAS.

In the Budget table we’re going to create a calculated column called PL Operator that uses the PowerPivot relationships we’ve defined to pull the operator down into the Budget table. The simple DAX formula I’ve used can be seen in the next screen capture:

customrollup8

Once we have the operator we can create another calculated column to create the PL Amount column that applies the operator by multiplying PL Operator with the Balance as shown below.

customrollup9

So we can see the PL Amount column now has the rollup operator applied. The only thing left to do is review what columns you want to make visible in the model and drop a pivot table onto a spread sheet and have a look. We could’ve just used 1 calculated column to achieve what we needed to do, I broke it down into 2 just for presentation purposes. Have a play at modifying the Excel based data rollups and refreshing the data model to see how quick and easy it is to manipulate the rollup.

Conclusion

This is a really simple way for users to consume data from a database and apply a very simple custom rollup. I really like the simplicity of it and the fact as a user I can just edit my rollups in my sheet and refresh the data and adjust how the accounts roll up. Also you might have noticed there are trial balance figures for the accounts in my data that would be needed to create a balance sheet and so we could progress this further by having another roll up table to create a Balance Sheet.

I’m going to have a play around a bit more and see if it’s possible to create more complex rollups and perhaps apply different rollup operators for different columns instead of just at the grain of the account. This is where I suspect things will become somewhat more complicated very quickly! Coming from a SSAS background I find it quite useful to take SSAS modelling concepts and challenge my self and PowerPivot to achieve the same end. PowerPivot modelling is a different way of thinking and whilst it may not do what SSAS does out of the box I’m convinced where there is a will there is a way, or maybe not.