Wednesday, October 20, 2010

Different ranking functions in SQL Server 2005 and 2008 and strategy to remove duplicate records in ETL type apps

Quick glance of different ranking functions that we can use in SQL Server -

Ranking functions return a ranking value for each row in a partition. Depending on the function that is used, some rows might receive the same value as other rows. Ranking functions are nondeterministic.

SQL Server introduced four different ranking functions to rank the rows in partition or in general -
1 RANK
2 NTILE
3 DENSE_RANK
4 ROW_NUMBER

http://msdn.microsoft.com/en-us/library/ms189798.aspx

Example on different strategies to remove duplicate records in SQL server in a typical ETL OR OLTP type applications -

http://www.mssqltips.com/tip.asp?tip=1952

http://www.mssqltips.com/tip.asp?tip=1918

Thanks - D

Typical Sync up of tables scenario in ETL type projects

If you are faced by a scenario where you typically have to sync up 2 tables by comparing source, destination most of the times and writing Insert, Delete and Update statements to achieve that, SQL Server 2008 introduces MERGE feature which is very powerful and fairly easy to achieve such sync up issues quick.

Example - Products dimension table has information about the products; you need to sync-up this table with the latest information about the products from the source table. You would need to write separate INSERT, UPDATE and DELETE statements to refresh the target table with an updated product list or do lookups. Though it seems to be straight forward at first glance, but it becomes cumbersome when you have do it very often or on multiple tables, even the performance degrades significantly with this approach. In the article below the author walks you through how to use the MERGE statement and do this in one pass. :)

USE - The MERGE statement basically merges data from a source result set to a target table based on a condition that you specify and if the data from the source already exists in the target or not. The new SQL command combines the sequence of conditional INSERT, UPDATE and DELETE commands in a single atomic statement, depending on the existence of a record. The new MERGE SQL command looks like as below:


SYNTAX - MERGE [AS TARGET]
USING [AS SOURCE]
ON
[WHEN MATCHED
THEN ]
[WHEN NOT MATCHED [BY TARGET]
THEN ]
[WHEN NOT MATCHED BY SOURCE
THEN ];


Nice and easy article for quick reference on MERGE - http://www.mssqltips.com/tip.asp?tip=1704

Thanks - D

Use of "Coalesce" in SQL Server 2008

Use of "Coalesce" in SQL Server - efficent way to script and avoid mutiple Case statements and ISNULL.

COALESCE - Returns the first nonnull expression among its arguments.

Syntax
COALESCE ( expression [ ,...n ] )

Arguments
expression - Is an expression of any type.

n - Is a placeholder indicating that multiple expressions can be specified. All expressions must be of the same type or must be implicitly convertible to the same type.

Return Types - Returns the same value as expression.

Remarks - If all arguments are NULL, COALESCE returns NULL.

COALESCE(expression1,...n) is equivalent to this CASE function:

CASE
WHEN (expression1 IS NOT NULL) THEN expression1
...
WHEN (expressionN IS NOT NULL) THEN expressionN
ELSE NULL

Simple real life example from MSDN - http://msdn.microsoft.com/en-us/library/aa258244(SQL.80).aspx

Other many uses as well - http://www.mssqltips.com/tip.asp?tip=1521

(Fav one from this article = was to KILL all transactions forecfully that are locked and needed to be released before restoring the database -

------------------------------------------
DECLARE @SQL VARCHAR(8000)

SELECT @SQL=COALESCE(@SQL,'')+'Kill '+CAST(spid AS VARCHAR(10))+ '; '
FROM sys.sysprocesses
WHERE DBID=DB_ID('AdventureWorks')

PRINT @SQL --EXEC(@SQL)

Replace the print statement with exec to execute will give you a result set such as the following.
-----------------------------------------------

Thanks - D