The Basics – T-SQL: Recursive Inheritance

This is a post regarding a SQL design pattern for applying configuration attributes to a parent child table. There’s a bunch of obvious use cases particularly around metadata and ETL control & configuration repositories…

Imagine if your ETL has hundreds if not thousands of processes. Often it makes sense to model process meta data in parent child table since you have 1 root process that when executed executes a hierarchy of child process. If then you have configuration options for those processes you won’t want to maintain a configuration row for every process right? that would be a huge maintenance headache. Ideally we would configure the root process with options that the child processes just inherit unless they’re specifically overridden. This gives us the best of both worlds in terms of being able to have the flexibility of specific configurations for child processes but where they don’t have configurations then inherit the configurations from the parent resulting in less configuration maintenance. This model doesn’t persist the inherited configuration but remains sparse and derives the inheritance logically when queried removing the need for updates or inserts.

We’ll have 2 tables:

  1. The parent child entity: process_executable
  2. The configuration attributes: process_configuration

Here they are:

CREATE TABLE [etl].[process_executable] (
 [process_executable_id] INT IDENTITY (1, 1) NOT NULL,
 [parent_process_executable_id] INT DEFAULT ((-1)) NOT NULL,
 [connection_id] INT NOT NULL,
 [process_executable_name] VARCHAR (200) NOT NULL,
 [created_date] DATETIME DEFAULT (getdate()) NOT NULL,
 [is_enabled] BIT DEFAULT ((1)) NOT NULL
 CONSTRAINT [PK_process_executable] PRIMARY KEY CLUSTERED ([process_executable_id] ASC)
);


CREATE TABLE [etl].[process_configuration] (
 [process_configuration_id] INT IDENTITY (1, 1) NOT NULL,
 [process_executable_id] INT NULL,
 [number_of_retries] INT NULL,
 [is_incremental] BIT NULL,
 CONSTRAINT [PK_process_configuration] PRIMARY KEY CLUSTERED ([process_configuration_id] ASC)
);

Here is the solution

First left join the configurations in so we get all process with and without configurations. Then recurse down the hierarchy using the a recursive CTE defaulting the configurations to the parent values where they are null.

WITH [cte_process_configuration]  
AS --join in the configuration 
(
  SELECT  
    p.[process_executable_id],  
    p.[parent_process_executable_id],  
    pc.number_of_retries, 
    pc.is_incremental 
  FROM etl.process_executable p 
  LEFT JOIN [etl].[process_configuration] pc ON p.process_executable_id = pc.process_configuration_id
 ), 
cte_collapse 
AS --recurse  down tacking the parent settings down to children where they don't have settings. 
(
  SELECT  
    m.process_executable_id, 
    m.parent_process_executable_id,
    m.max_threads,
    m.number_of_retries,
    m.is_incremental 
  FROM [cte_process_configuration] m 
  WHERE m.parent_process_executable_id = -1 

  UNION ALL

  SELECT 
    m.process_executable_id, 
    m.parent_process_executable_id, 
    --carry down the settings where there aren't any 
    ISNULL(m.number_of_retries, pm.number_of_retries) AS number_of_retries, 
    ISNULL(m.is_incremental, pm.is_incremental) AS is_incremental 
  FROM [cte_process_configuration] m 
  JOIN cte_collapse pm ON m.parent_process_executable_id = pm.process_executable_id 
)
--show me the money! 
SELECT 
  process_executable_id,  
  number_of_retries, 
  is_incremental 
FROM cte_collapse c

 

Advertisements