Ordering Tables by Foreign Key Dependency

 
key.jpg

SQL tips

Foreign Key Dependency

 

As a SQL developer I love constraints, and fewer constraints come in handier than foreign keys. Mappings between two tables become defined to both human and server: other developers understand how your tables come together, and the server prevents nonsense from being written to tables.

This being said, foreign keys become an administrative hassle when replicating data between other databases or from backup. Any table that references any other table will have to be loaded after its precedent to avoid foreign key violations, and inexplicably few database systems allow for a means to order tables by their dependencies. This doesn’t mean to say that doing so is impossible though.

Suppose we had the following database design:

sqlblog.png

In Microsoft SQL Server there is helpfully the foreign_key_columns system view which will list all foreign keys, their parent objects/columns and their referenced object/columns. Used in conjunction with other system views like tables, this becomes an effective means of determining which tables to load first.

The following method populates a temporary table, #AllTables, with every table on the current database and assigns an appropriate dependency_number value:

/* 

* Temporary table used such that

* the compiled list can be

* freely interacted with later

*/

CREATE TABLE #AllTables(
[object_id] INT NOT NULL
,[name] NVARCHAR(128) NOT NULL
,[schema_name] NVARCHAR(128) NOT NULL
,[dependency_number] SMALLINT NULL
);

— Step #1
INSERT INTO #AllTables([object_id],[name],[schema_name])
SELECT t.[object_id],t.[name],s.[name]
FROM [sys].[tables] t
INNER JOIN [sys].[schemas] s ON t.[schema_id] = s.[schema_id]
ORDER BY t.[object_id];

— Step #2
— Dependency #0 tables

UPDATE #AllTables
SET [dependency_number] = 0
WHERE [object_id] NOT IN(
SELECT [parent_object_id]
FROM [sys].[foreign_key_columns]
);

— Step #3
— Dependency #n tables

DECLARE @n SMALLINT;
SELECT @n = 1;
WHILE EXISTS(
SELECT * FROM #AllTables
WHERE [dependency_number] IS NULL
) AND(@n < 50)


— Safety first

BEGIN

UPDATE o
SET o.[dependency_number] = so.[dependency_number]
FROM #AllTables o
INNER JOIN(
SELECT po.[object_id],MAX(ro.[dependency_number]) + 1
FROM [sys].[foreign_key_columns] fkc
INNER JOIN #AllTables po
ON fkc.[parent_object_id] = po.[object_id]
INNER JOIN #AllTables ro
ON fkc.[referenced_object_id] = ro.[object_id]
WHERE fkc.[parent_column_id] != fkc.[referenced_column_id]
AND po.[dependency_number] IS NULL
AND NOT EXISTS(
SELECT *
FROM [sys].[foreign_key_columns] sfkc
INNER JOIN #AllTables sro
ON sfkc.[referenced_object_id] = sro.[object_id]
WHERE sfkc.[parent_object_id] = po.[object_id]
AND sro.[dependency_number] IS NULL

GROUP BY po.[object_id]
) so([object_id],[dependency_number])
ON o.[object_id] = so.[object_id];

SET @n = @n + 1;

END;

First of all we would need the tables. This list can be acquired easily enough from the tables system view, after which we could use our own table henceforth. Of these tables we can immediately mark those which do not exist in foreign_key_columns as a parent table. These will have a dependency number of #0. Going back to our example database, the tables Person and Product will have dependency #0.

Then we enter the loop. Of the tables that remain unmarked, we intend to find all the tables which refer directly to our dependency #0 tables to begin with, then our dependency #1 tables and so forth. The dependency number of a table should be the highest there is, plus one. The NOT EXISTS clause is there to guarantee that only tables which do not depend on tables with unknown dependency numbers. Without this, OrderProduct would also be given a dependency of #1, which would be incorrect: it refers to CustomerOrder.

After running on our example database we get:

name dependency_number
—————- —————–
Person 0
Product 0
Price 1
CustomerOrder 1
OrderProduct 2

From now on if ever you need to load data into such a database, all that you would need to do is load Person and Product data into the database first, then Price andCustomerOrder data, and so on. Deleting data/truncating tables? Easy – just work in reverse!

The loop assumes that no circular dependencies exist (e.g. two tables referencing each other). If they do, the loop will break out after the 49thiteration anyway. Alas, you will have no alternative to handling such dependencies other than to disable the foreign keys with:

ALTER TABLE <table_name> NOCHECK CONSTRAINT <foreign_key_name>

Remember to restore your foreign keys afterwards though!

White PaperNigel Ivy