Using Merge & Hashbytes for Slowly Changing Dimensions

In this post, we will have a look at how to perform slowly changing dimensions using MERGE & HASHBYTES. With this method, you can effectively perform INSERT, UPDATE & DELETE operations based on the differences found between two tables, namely, your source & target table.

Note: You will be able to perform SCD Type 1 using the example provided in this post.

HASHBYTES

HASHBYTES is a cryptographic algorithm that can output the below hashes:

MD2, MD4, MD5, SHA, SHA2_256, SHA2_512

Example of output: 0x5265E7879B621F3C3932FDF297C94E921FB8B9C2005F95F7DBEF04A61623907E

Note: Since SQL Server 2016, all algorithms besides SHA2_256 and SHA2_512 are deprecated.

Breakdown of the Merge Statement

In order to simplify the understanding of the MERGE & HASHBYTE query, I will break it down into smaller fragments to explain how it works.

In the first part of the MERGE statement, you need to specify the target table which you would like to insert into – note that I specified it as [TARGET]. After that, you need to specify the source table [SOURCE]. Lastly, you will specify the columns which the MERGE should join on to determine a match, e.g. ON [TARGET].[CustomerCode] = [SOURCE].[CustomerCode].

Now if we look at the below query, we specify the condition WHEN NOT MATCHED BY TARGET which means if the ON [TARGET].[CustomerCode] = [SOURCE].[CustomerCode] was not met, then we want to insert the data into the [TARGET] table. Notice in the VALUES section the HASHBYTES. Here I use HASHBYTES to create a unique hash key which is inserted into the [TARGET] table and will be used later for tracking changes.

Note: You can build the hash key based on columns you would like to track for SCD.

In the last part of the MERGE statement, we specify the WHEN MATCH condition, which means if the condition ON [TARGET].[CustomerCode] = [SOURCE].[CustomerCode] was met (value exists), then we want to update the [TARGET] table. However, we only want to update the table if there was a change. That is why we use the hash key once again to check if there was a change in any of the columns you track for changes. Lastly, when performing an update we also need to update the hash key once again.

The Complete Merge

The Merge in Action

In order to see how it works, I will run the MERGE for the first time which means the [TARGET] table is currently empty. As you can see below all rows are inserted from the [SOURCE] table to the [TARGET] table.

Running it a second time

Notice that if I run the exact statement once again, no rows will be affected because no changes occurred inside the [SOURCE] table.

Making a change to the source table

Now that we can see the insert is working and that it doesn’t insert duplicated rows, let’s make a change to the [SOURCE] table and see what happens.

As we can see the above update has updated 58058 rows. Therefore, we can expect the MERGE statement to update 58058 rows.

Finally, we can see that the MERGE statement performs smoothly inserting new rows and updating existing changed data.

You may also like...

1 Response

  1. Arthemis says:

    Brilliant post, well done.

Leave a Reply

Your email address will not be published. Required fields are marked *