Home >> Transact SQL >> Ranking data in t-SQL

Ranking data in Transact SQL

Introduction

From the perspective of data management, a table in SQL is a logical representation of facts as a multiset of rows. And by its very definition rows are unordered for all logical purposes. Therefore sorting a set of rows based on values in one or more column in ascending or descending fashion is a very common and useful data display requirement. Quite often the applications that display an ordered set of records can benefit from having a mechanism for associating a sequentially incrementing or decrementing numerical value -- often known as rank.

This article deals with a few of the common approaches that allow the user to take advantage of various Transact SQL constructs in generating rank values.

Considerations

Before proceeding to the various alternatives, the reader is urged to read the commonly referred knowledge base article on the topic: 186133. Later in the article a section is devoted to explaining the various approaches in dealing with paging – which is simply an extension of the various ranking methods.

All the examples in the following section are based on the tables from the sample database, Northwind.

Ranking based on single row

Here is an example of generating rank values based on a correlated subquery on the sequence of ContactName
SELECT c1.ContactName, 
	( SELECT COUNT(*)
	    FROM Customers c2
	   WHERE c2.ContactName <= c1.ContactName ) AS "rank"
  FROM Customers c1
 ORDER BY c1.ContactName ;
The same logic is being re-written using a self-join
SELECT c1.ContactName, COUNT(*) AS "rank"
  FROM Customers c1
  JOIN Customers c2
    ON c2.ContactName <= c1.ContactName
 GROUP BY c1.ContactName
 ORDER BY c1.ContactName;
Using one of the windowing functions ROW_NUMBER() or RANK(), the values can be generated more easily.
SELECT c1.ContactName,
	ROW_NUMBER() OVER ( ORDER BY ContactName ) AS "rank"
  FROM Customers c1
 ORDER BY c1.ContactName;
Ranking based on multiple columns where the ranking is done based on the grouping of another column

Here is an example with CustomerID as the grouped column and OrderDate as the ranked column from the Orders table
SELECT o1.CustomerID, o1.OrderDate,
	( SELECT COUNT(*)
	    FROM dbo.Orders o2
	   WHERE o2.CustomerID = o1.CustomerID
	     AND o2.OrderDate <= o1.OrderDate ) AS "rank"
  FROM dbo.Orders o1
 ORDER BY o1.CustomerID, o1.OrderDate ;
With ROW_NUMBER() function, one can use the PARTITION BY clause to group by another column.
SELECT CustomerID, OrderDate,
	ROW_NUMBER() OVER ( PARTITION BY CustomerID ORDER BY OrderDate ) AS "rank"
  FROM Orders
 ORDER BY CustomerID, OrderDate ;
For ranking requirements avoid using IDENTITY columns, since it requires additional tables ( either temporary tables or in some cases, table variables ). Also, there are some approaches that exploit the procedural looping using cursors or similar sort that are not often recommended.

Paging solutions

When the resultset of a particular query is so large that the client interfaces cannot display them as a single set of records, often various approaches to split the resultset into multiple resultsets of predefined size are considered. In general development circles, this is known as paging. Paging requirements are very common while designing web sites where users can navigate through the resultset one page at a time.

Using the above ranking methods, to get the customers that are ranked between 5 and 10, you can simply add a WHERE clause

SELECT *
  FROM Customers c1
 WHERE ( SELECT COUNT(*)
	   FROM Customers c2
	  WHERE c2.ContactName <= c1.ContactName ) BETWEEN 5 AND 10
 ORDER BY c1.ContactName ;
This is the same logic as re-written using a self join
SELECT c1.ContactName
  FROM Customers c1
  JOIN Customers c2
    ON c2.ContactName <= c1.ContactName
 GROUP BY c1.ContactName
HAVING COUNT(*) BETWEEN 5 AND 10
 ORDER BY c1.ContactName;
Note that in such instances, if there are ties in the ContactName values, then we would have to use a tie-breaker column. In that case the correlation in the subquery would change like:
WHERE c2.ContactName <= c1.ContactName
   OR ( c2.tie-breaker = c1.tie-breaker AND c2.ContactName <= c1.ContactName ) 
Another method is to encapsulate the subquery into a scalar UDF along the lines of:
CREATE FUNCTION dbo.rank ( @cName NVARCHAR(60) )
RETURNS INT AS BEGIN
RETURN( SELECT COUNT(*) FROM Customers 
	 WHERE ContactName <= @cName )
END
With this UDF, the query can be as simple as:
SELECT *
  FROM Customers
 WHERE dbo.rank ( ContactName ) BETWEEN 5 AND 10 ;
The windowing functions cannot be used directly in the WHERE clause and so, to use them we will have to use a derived table or a Common Table Expression like:
; WITH CTE ( CustomerID, OrderDate, rank ) AS (
	SELECT CustomerID, OrderDate,
	       ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS "rank"
	  FROM Orders )
SELECT *
  FROM CTE
 WHERE rank BETWEEN 5 AND 10 ;	
When stored procedures are used to handle such queries that facilitate ranking, input parameters for page size and page number are often passed in. In such cases, the parameters can be used like:
CREATE PROCEDURE usp ( @page_nbr INT, @page_size INT )
AS
; WITH CTE ( CustomerID, OrderDate, rank ) AS (
	SELECT CustomerID, OrderDate,
	       ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS "rank"
	  FROM Orders )
SELECT CustomerID, OrderDate, rank 
  FROM CTE
 WHERE rank > ( @page_nbr - 1 ) * @page_size 
   AND rank <= @page_nbr * @page_size ;
Regardless of which method is used, all the various factors should be carefully weighed in before deciding to use a particular ranking or paging solution. Since each of the methods detailed above can be impacted by various aspects of the physical model, make sure to test the solutions against appropriate datasets before making claims regarding performance.

Some additional references

How to dynamically number rows in a SELECT Transact-SQL statement ( Microsoft Support )
How do I page through a recordset? ( Aaron Bertrand )