By Ken Whiteside

DataXperts

Using the new Merge Command to handle Insert / Update transactions.

This new command is extremely useful for being a one stop construct to handle what Oracle or Informatica commonly call Upserts. Of course you can get by with the old is exists logic and programatically handle the inserts and updates. That is until you get into a serious Data Warehouse load. Lots of rows of data where some will already exist in your fact tables, and others will be new inserts. How do you track the keys, whether existing or a new identity insert, to have on hand wh3en you pupulate the child dimensions? The Merge command give you an Output clause to expose that ID (or any other column) and whether it was inserted or updated!

Create a quick table in some test database.

CREATE TABLE Employee

(

Employee_ID INT IDENTITY,

Employee_first_name nvarchar(50),

Employee_last_name nvarchar(50),

Employee_email varchar(100),

Employee_hire_date datetime,

Employee_birth_date DATETIME,

Last_Updated DATETIME DEFAULT GETDATE()

)

Use the Code Snippet below to enter a couple records.

Run it once as is.

Then change @p_Employee_first_name to ‘employee2’ and run it again.

Notice when you run it, you will get output showing you whether it inserted or updated and the field values.

You should have two records, employee1 and employee2 at this point.

Now change @p_Employee_first_name back to ‘employee1’ and also change @p_Employee_email to ‘newemail@test.com’ and run it again.

When you run it this time, the Merge construct will find a match and execute the “WHEN MATCHED” section of code instead of the “WHEN NOT MATCHED” code. The output results will also show that it found a match. The $action will show UPDATE, and the record had the Employee_email and Last_Updated fields updated.

Code Snippet

Declare @p_Employee_first_name nvarchar(50),

@p_Employee_last_name nvarchar(50),

@p_Employee_email varchar(100) =null,

@p_Employee_hire_date datetime =null,

@p_Employee_birth_date datetime =null

set @p_Employee_first_name = 'employee1'

set @p_Employee_last_name = 'whiteside'

set @p_Employee_email = 'email@test.com'

set @p_Employee_hire_date = '01/01/2007'

set @p_Employee_birth_date = '06/01/1970'

MERGE INTO dbo.Employee AS T

USING (Select @p_Employee_first_name, @p_Employee_last_name) AS S (Employee_first_name, Employee_last_name)

ON T.Employee_first_name = S.Employee_first_name and T.Employee_last_name = S.Employee_last_name

WHEN MATCHED

THEN UPDATE SET T.last_updated = GETDATE(),

T.Employee_email = @p_Employee_Email

WHEN NOT MATCHED

THEN

INSERT

(

[Employee_first_name]

,[Employee_last_name]

,[Employee_email]

,[Employee_hire_date]

,[Employee_birth_date]

)

VALUES

(

@p_Employee_first_name

,@p_Employee_last_name

,@p_Employee_email

,@p_Employee_hire_date

,@p_Employee_birth_date

)

Output $action, inserted.*;

Tracking the Keys from the Merge operation for further use with surrogates

Let’s modify the Code Snippet just a bit to track the keys.

Run the Code Snippet below once as is, this will create ‘employee3’ for a third record

Now change @p_Employee_first_name to “employee1” and @p_Employee_email to “update@hotmail.com” . Then run the script for the first Declare down.

We can now see the stored results of what happened inside the hash table

Select * from #Employee

And of course the values in the Destination Fact Table with Select * from Employee

Code Snippet with Key Tracking

Create table #Employee

(

Employee_ID INT,

Employee_first_name nvarchar(50),

Employee_last_name nvarchar(50),

Employee_email varchar(100),

Employee_hire_date datetime,

Employee_birth_date DATETIME,

Last_Updated DATETIME

)

Declare @p_Employee_first_name nvarchar(50),

@p_Employee_last_name nvarchar(50),

@p_Employee_email varchar(100) =null,

@p_Employee_hire_date datetime =null,

@p_Employee_birth_date datetime =null

set @p_Employee_first_name = 'employee3'

set @p_Employee_last_name = 'whiteside'

set @p_Employee_email = 'email@test.com'

set @p_Employee_hire_date = '01/01/2007'

set @p_Employee_birth_date = '06/01/1970'

MERGE INTO dbo.Employee AS T

USING (Select @p_Employee_first_name, @p_Employee_last_name) AS S (Employee_first_name, Employee_last_name)

ON T.Employee_first_name = S.Employee_first_name and T.Employee_last_name = S.Employee_last_name

WHEN MATCHED

THEN UPDATE SET T.last_updated = GETDATE(),

T.Employee_email = @p_Employee_Email

WHEN NOT MATCHED

THEN

INSERT

(

[Employee_first_name]

,[Employee_last_name]

,[Employee_email]

,[Employee_hire_date]

,[Employee_birth_date]

)

VALUES

(

@p_Employee_first_name

,@p_Employee_last_name

,@p_Employee_email

,@p_Employee_hire_date

,@p_Employee_birth_date

)

OUTPUT INSERTED.Employee_ID AS Employee_ID,

INSERTED.Employee_first_name AS Employee_first_name,

INSERTED.Employee_last_name AS Employee_last_name,

INSERTED.Employee_email AS Employee_email,

INSERTED.Employee_hire_date AS Employee_hire_date,

INSERTED.Employee_birth_date AS Employee_birth_date,

INSERTED.Last_Updated AS Last_Updated

into #Employee;

Now imagine if you were loading data from one warehouse into another. Keys from the source exist to establish the surrogate relationships, but you cannot use those keys in the target Data Warehouse. The source relationships are already set in place mean nothing to the human eye, but you need to preserve that relationship, but must utilize new keys. You could easily add the OldKey to the #Employee table and store that value from the source data into it for each row. Now if you need to join back to more dimensions from the source data, you have the old key to establish that relationship. Simply join the #Employee table (with the OldKey inserted) to the next source dimension by the OldKey to bring over the Dimensions as they were setup from the original data, but with new keys being established.

If you are certain the Source data is all new data, you can bypass the match / not match logic by changing the code with

ON (1=0) –force a not match condition

WHEN NOT MATCHED

THEN

And still utilize the OUTPUT clause to track the Key values.