Quick Tip – SSAS Tabular Referential Integrity

SSAS tabular is a tad more loosely coupled than multi-dim. A quick and easy way to determine referential integrity violations on columns is to use the $system.DISCOVER_STORAGE_TABLES. This DMV query will return a rowset with the table details and a column called RIVIOLATION_COUNT that indicates the number of rows with a referential integrity violation.

select * from $system.DISCOVER_STORAGE_TABLES

SSAS RI Violation

 

Finer Points – DAX : Calculated Columns and Context

This could be considered “The Basics” but it depends where you are on your learning journey. First lets review some high level guidance whether to use a calculated column or measure:

Column:

  • Row based calculations – if you’re a SQL whizz think.
    f() OVER(PARTITION BY ... ORDER BY...)

    We get the same number of rows but we can aggregate within a row over the whole table or part of the table depending on the values of the current row.

  • Can be used on any Axis, including the slicer – We’ve effectively created another attribute that can be used on rows, columns or slicer (which is just another axis).

Measure:

  • Aggregated Calculations  – Generally we use it to aggregate data across the whole model and perform calculations using aggregates. Think ratios and again if you’re a SQL whizz you’d probably use a bunch of sub queries, aggregate functions and join it together to calculate your ratio
  • User Context Sensitive – These calculations will be aggregated using the calculation you’ve defined across the selections on the axis a user makes i.e. they are context sensitive based on user selections. SQL cannot do this using a single expression – this is why cubes are very analytically powerful compared to SQL but conversely DAX is much harder to fully grasp. Generally speaking DAX will offer better performance for aggregates most of the time (when done correctly).

The journey to DAX hasn’t been to bad for myself since my MDX was pretty solid. Once you get over the differences of dealing with tables it’s a home run since the fundamental of context based calculations over user selected combinations of the whole data model is a way of thinking required for both DAX and MDX. It’s that way of thinking that is the learning curve.

Comparative Column Calculations

So we’re going with column just for the learning points. I’m blogging about this particular topic since if you’re coming from SQL or MDX then it might be helpful. Typically we would be familiar with the following:

  • MDX: CurrentMember – In DAX there is no such thing!
  • SQL: OVER() – In DAX there is no such thing!

So imagine the scenario:

We have a table of categorised granular measures and we want to create a calculated column for each row using:

  • Each row value n
  • & N = Aggregate(n) for category of n

Note that aggregate may be sum, min, max, etc.

Example

We have a data set of run times for sequential processes split across 2 parallel streams. The data is at the grain of the process execution. I have:

  • Stream : the stream it executed in
  • Process : name of the process
  • StartDate : when it started
  • EndDate : when it ended

I want to plot a Gantt Chart using a Stacked Bar Chart of each stream to compare race conditions to visualise if it could be more efficient. To do this I need to calculate:

  • ExecutionDuration : Easy EndDate – StartDate (in seconds)
  • StreamStartDate : This is the MIN(StartDate) of a given stream; we need it to calculate the WaitDuration. Perhaps don’t need this step but helps with understanding.
  • WaitDuration : StartDate – StreamStartDate (in seconds)

So some example data might be as follows:

DAX Earlier

In SQL

DAX Earlier 1

In DAX

In order to calculate the WaitDuration my brain immediately jumps to currentMember since I can refer to the current member context for the startdate and an aggregated min() startdate over the context over the whole data set with the same stream. But there are no members and there is no current member so how to we do this in DAX. Again in multiple steps for clarity.

Column: StreamStartDate

StreamStartDate =
MINX (
    FILTER (
        'ProcessExec',
        'ProcessExec'[Stream] = EARLIEST ( 'ProcessExec'[Stream] )
    ),
    'ProcessExec'[StartTime]
)

Column: WaitDuration

WaitDuration = 
DATEDIFF('ProcessExec'[StreamStartDate], 'ProcessExec'[StartDate], SECOND)

Column: RunDuration

RunDuration = 
DATEDIFF('ProcessExec'[StartDate], 'ProcessExec'[EndDate], SECOND)

WaitDuration and RunDuration require no explanation they should be easily understood.

StreamStartDate however makes use of the Earliest function which does require explanation. I’ve seen various explanations over the years and the docs themselves have changed. Essentially it’s hard explain and this is the crux of why DAX is hard to understand. Start simple and build – this is a pretty simple example.

Earliest and Earlier functions refer to previous execution contexts! The Earliest being the very 1st.

In English the calc is:

Get the MIN StartDate for a set of data where the Stream is equal to the Stream in the current row.

MINX takes a table and an expression to aggregate in this case min.

MINX(<table>, <expression>)

We’ve passed in a filtered table to <table> to contain only those rows that have a stream equal to the stream in our current row.

    FILTER (
        'ProcessExec',  <-- this has a table context!
        'ProcessExec'[Stream] = EARLIEST ( 'ProcessExec'[Stream] ) <-- acts as where condition
    )

Because Column calculations start with a ROW context we switch context from the table context (passed to the filter function) to row context using the Earliest function to compare Stream values on the filter.

Column or Measure

I thought this would be a worth while example because it starts to condition your thinking away from current member. However should this be a calculated column or measure? I’ll leave that one to stew with the following points:

  • Is duration a useful additive continuous measure? Yes
  • Would it be cool if we could efficiently drill up and down on our Gantt chart for large data sets? Yes (ignoring the limitations in Power BI data vis!)
  • Will the continuous duration measure be useful as categorical data on rows, column, slicer or any other axis? If it’s banded it might be otherwise no.
  • Do we want the measure to display aggregated correctly no matter what selections the user makes with other measures? Yes
  • Calculated columns persist the data into model memory, do we need to? No

MS Tabular – Simple Custom Rollup

Introduction

 

I have been having a bit of hands on with SSAS Tabular just lately and one the aspects that I think is extremely powerful is that it provides users with the ability to mash up their own data in various ad-hoc data sources with data from a structured SQL Server database. More specifically allowing users to create a control how granular attributes of a dimension might rollup up into a hierarchy is a valuable thing, for example an organisational structure or an accounts hierarchy. Users do need to be quite good with data modelling and DAX to leverage this and time will tell on that one. However, with that in mind I’m going to create a user controlled custom rollup on an accounts dimension using only very basic data modelling and DAX concepts.

 

Essentially within an accounting solution you might want to avoid storing numbers in the context of positive (+ve) and negative (-ve) since in accounting terms either the business owns it or it doesn’t i.e. it’s either an asset or a liability. Also things come into the business and things go out i.e. they purchase supplies and sell goods and services. None of the accounts themselves are either +ve or –ve since in accounting we deal with either credits or debits. Depending on what aspect of the business you’re evaluating and how you’re presenting the figures an account may considered as in credit for one context but in debit for another. For example the Closing Balance of Stock is considered a Credit in the P&L but a Debit in the Balance Sheet.

 

In BI data modelling terms we deal with this by associating the aggregation operator with the reporting dimension hierarchy rather than the number itself thus when the measure is combined with the hierarchy it is aggregated within the context of the hierarchy. More specifically if you’re familiar with SQL Server Analysis Services Multidim we do this using a rollup operator that is stored on the accounts dimensions.

 

This blog is a simple DAX design pattern to allow business users to create their own custom rollups on an accounts dimension and trial balance fact table from SQL Server.

Solution

 

I always like to present the final solution before going into the specifics of the design since knowing the final output tends to make design details a bit easier to digest. So here’s what we’re aiming for:
Custom Rollup in PowerPivot

 

So here we can see I’ve got a P&L hierarchy that rolls up the accounts through Gross Profit and Expenses and calculates the Profit. The accounts and figures have all been taken from a SQL Server database however which accounts and how they have been rolled up into Cost of Goods Sold, Sales, Gross Profit, Expenses and Net Profit has all been created and controlled by the Excel user. The P&L roll up is created from the following table in the Excel file that is linked into the Tabular model:

customrollup2

 

So in this table we can see the accounts mapping to relevant parent P&L members and a rollup operator that indicates how the account should be evaluated. The Rollup operators should look familiar if you’re from a SSAS Multidim background like me since the symbols are the same. + is for sum, – is subtract and ~ is to exclude. Notice there are many accounts that are not used here since they are not included in the P&L and as such have no parents defined either. A different rollup application might have the numbers excluded from the rollup but still have them included in the hierarchy.

Implementation

 

The underlying SQL Database contains an Accounts dimension, Date dimension and a fact table containing account Balances. The simple star schema is as follows:

customrolloup3

From the diagram we can see that the Account dimension is simply a list of accounts and there is no hierarchy. FactBalance contains the account trial balances which are all physically positive numbers which can conceptually be either summed or subtracted. I’ve included a very simple Date dimension as usual since it’s always good to see how a solution works with more than 1 dimension in play to test and understand what is going on inside the software without making it too complex.

The next step is to pull these 3 tables into SSAS tabular. I won’t go into how this is done in detail since there’s there are many resources that describe this process and I’m assuming if you’re looking to do custom rollup then you’ve already made a start with the basics.

Below is the Accounts dimension and Balance fact table that I’ve pulled into Tabular and renamed for usability.

customrollup4
The next step is to go to the Balance table tab and relate for Account_Key to the Account_Key in the Account table and Date_Key to the Date_Key in the Date table. If your database had these relationships when you imported the tables then Tabular will have created them automatically.

So, now we have our Tabular model we’re going to create a table in Excel to rollup the Accounts dimension. Enter the following data into spread sheet in the workbook.

customrollup5

 

With Cell A1 selected click the Link Table icon on the Excel PowerPivot tab.

customrollup7

This will import the data into the PowerPivot model linked directly to the data in the Excel sheet. Rename the new table we’ve just imported to something useful and relate the Account dimension to it using the Account Code and it should look like as follows:

customrollup7

So now we have all the data nicely imported into PowerPivot and related together we’ll apply the custom rollup. To do this I’m going to use the way that SSAS actually applies the operators under the covers (I believe but am sure someone will correct me if I’m wrong). Essentially + operator multiples the fact data by +1, – multiplies the fact data by –1 and ~ multiplies the fact by 0 or null to effectively yield null. After the operator has been multiplied which adjusts the sign of the data or nulls it out we leave it to PowerPivot to simply sum the result to give the answer. At this point you’re probably right in thinking I could’ve just used +1, –1 and 0 instead of +, – and ~ however I just wanted it to be consistent with what we’re (or I’m) used to in SSAS.

In the Budget table we’re going to create a calculated column called PL Operator that uses the PowerPivot relationships we’ve defined to pull the operator down into the Budget table. The simple DAX formula I’ve used can be seen in the next screen capture:

customrollup8

Once we have the operator we can create another calculated column to create the PL Amount column that applies the operator by multiplying PL Operator with the Balance as shown below.

customrollup9

So we can see the PL Amount column now has the rollup operator applied. The only thing left to do is review what columns you want to make visible in the model and drop a pivot table onto a spread sheet and have a look. We could’ve just used 1 calculated column to achieve what we needed to do, I broke it down into 2 just for presentation purposes. Have a play at modifying the Excel based data rollups and refreshing the data model to see how quick and easy it is to manipulate the rollup.

Conclusion

This is a really simple way for users to consume data from a database and apply a very simple custom rollup. I really like the simplicity of it and the fact as a user I can just edit my rollups in my sheet and refresh the data and adjust how the accounts roll up. Also you might have noticed there are trial balance figures for the accounts in my data that would be needed to create a balance sheet and so we could progress this further by having another roll up table to create a Balance Sheet.

I’m going to have a play around a bit more and see if it’s possible to create more complex rollups and perhaps apply different rollup operators for different columns instead of just at the grain of the account. This is where I suspect things will become somewhat more complicated very quickly! Coming from a SSAS background I find it quite useful to take SSAS modelling concepts and challenge my self and PowerPivot to achieve the same end. PowerPivot modelling is a different way of thinking and whilst it may not do what SSAS does out of the box I’m convinced where there is a will there is a way, or maybe not.