This is a post regarding a SQL design pattern for applying configuration attributes to a parent child table. There’s a bunch of obvious use cases particularly around metadata and ETL control & configuration repositories…
Imagine if your ETL has hundreds if not thousands of processes. Often it makes sense to model process meta data in parent child table since you have 1 root process that when executed executes a hierarchy of child process. If then you have configuration options for those processes you won’t want to maintain a configuration row for every process right? that would be a huge maintenance headache. Ideally we would configure the root process with options that the child processes just inherit unless they’re specifically overridden. This gives us the best of both worlds in terms of being able to have the flexibility of specific configurations for child processes but where they don’t have configurations then inherit the configurations from the parent resulting in less configuration maintenance. This model doesn’t persist the inherited configuration but remains sparse and derives the inheritance logically when queried removing the need for updates or inserts.
We’ll have 2 tables:
- The parent child entity: process_executable
- The configuration attributes: process_configuration
Here they are:
CREATE TABLE [etl].[process_executable] ( [process_executable_id] INT IDENTITY (1, 1) NOT NULL, [parent_process_executable_id] INT DEFAULT ((-1)) NOT NULL, [connection_id] INT NOT NULL, [process_executable_name] VARCHAR (200) NOT NULL, [created_date] DATETIME DEFAULT (getdate()) NOT NULL, [is_enabled] BIT DEFAULT ((1)) NOT NULL CONSTRAINT [PK_process_executable] PRIMARY KEY CLUSTERED ([process_executable_id] ASC) ); CREATE TABLE [etl].[process_configuration] ( [process_configuration_id] INT IDENTITY (1, 1) NOT NULL, [process_executable_id] INT NULL, [number_of_retries] INT NULL, [is_incremental] BIT NULL, CONSTRAINT [PK_process_configuration] PRIMARY KEY CLUSTERED ([process_configuration_id] ASC) );
Here is the solution
First left join the configurations in so we get all process with and without configurations. Then recurse down the hierarchy using the a recursive CTE defaulting the configurations to the parent values where they are null.
WITH [cte_process_configuration] AS --join in the configuration ( SELECT p.[process_executable_id], p.[parent_process_executable_id], pc.number_of_retries, pc.is_incremental FROM etl.process_executable p LEFT JOIN [etl].[process_configuration] pc ON p.process_executable_id = pc.process_configuration_id ), cte_collapse AS --recurse down tacking the parent settings down to children where they don't have settings. ( SELECT m.process_executable_id, m.parent_process_executable_id, m.max_threads, m.number_of_retries, m.is_incremental FROM [cte_process_configuration] m WHERE m.parent_process_executable_id = -1 UNION ALL SELECT m.process_executable_id, m.parent_process_executable_id, --carry down the settings where there aren't any ISNULL(m.number_of_retries, pm.number_of_retries) AS number_of_retries, ISNULL(m.is_incremental, pm.is_incremental) AS is_incremental FROM [cte_process_configuration] m JOIN cte_collapse pm ON m.parent_process_executable_id = pm.process_executable_id ) --show me the money! SELECT process_executable_id, number_of_retries, is_incremental FROM cte_collapse c