Introduction
From the perspective of data management, a table in SQL is a logical representation of facts as a multiset of
rows. And by its very definition rows are unordered for all logical purposes. Therefore sorting a set of rows
based on values in one or more column in ascending or descending fashion is a very common and useful data
display requirement. Quite often the applications that display an ordered set of records can benefit from having
a mechanism for associating a sequentially incrementing or decrementing numerical value -- often known as rank.
This article deals with a few of the common approaches that allow the user to take advantage of various Transact
SQL constructs in generating rank values.
Considerations
Before proceeding to the various alternatives, the reader is urged to read the commonly referred knowledge base
article on the topic: 186133. Later
in the article a section is devoted to explaining the various approaches in dealing with paging – which is simply
an extension of the various ranking methods.
All the examples in the following section are based on the tables from the sample database, Northwind.
Ranking based on single row
Here is an example of generating rank values based on a correlated subquery on the sequence of ContactName
SELECT c1.ContactName,
( SELECT COUNT(*)
FROM Customers c2
WHERE c2.ContactName <= c1.ContactName ) AS "rank"
FROM Customers c1
ORDER BY c1.ContactName ;
The same logic is being re-written using a self-join
SELECT c1.ContactName, COUNT(*) AS "rank"
FROM Customers c1
JOIN Customers c2
ON c2.ContactName <= c1.ContactName
GROUP BY c1.ContactName
ORDER BY c1.ContactName;
Using one of the windowing functions ROW_NUMBER() or RANK(), the values can be generated more easily.
SELECT c1.ContactName,
ROW_NUMBER() OVER ( ORDER BY ContactName ) AS "rank"
FROM Customers c1
ORDER BY c1.ContactName;
Ranking based on multiple columns where the ranking is done based on the grouping of another column
Here is an example with CustomerID as the grouped column and OrderDate as the ranked column from the Orders table
SELECT o1.CustomerID, o1.OrderDate,
( SELECT COUNT(*)
FROM dbo.Orders o2
WHERE o2.CustomerID = o1.CustomerID
AND o2.OrderDate <= o1.OrderDate ) AS "rank"
FROM dbo.Orders o1
ORDER BY o1.CustomerID, o1.OrderDate ;
With ROW_NUMBER() function, one can use the PARTITION BY clause to group by another column.
SELECT CustomerID, OrderDate,
ROW_NUMBER() OVER ( PARTITION BY CustomerID ORDER BY OrderDate ) AS "rank"
FROM Orders
ORDER BY CustomerID, OrderDate ;
For ranking requirements avoid using IDENTITY columns, since it requires additional tables ( either
temporary tables or in some cases, table variables ). Also, there are some approaches that exploit
the procedural looping using cursors or similar sort that are not often recommended.
Paging solutions
When the resultset of a particular query is so large that the client interfaces cannot display them
as a single set of records, often various approaches to split the resultset into multiple resultsets
of predefined size are considered. In general development circles, this is known as paging. Paging
requirements are very common while designing web sites where users can navigate through the resultset
one page at a time.
Using the above ranking methods, to get the customers that are ranked between 5 and 10, you can simply add
a WHERE clause
SELECT *
FROM Customers c1
WHERE ( SELECT COUNT(*)
FROM Customers c2
WHERE c2.ContactName <= c1.ContactName ) BETWEEN 5 AND 10
ORDER BY c1.ContactName ;
This is the same logic as re-written using a self join
SELECT c1.ContactName
FROM Customers c1
JOIN Customers c2
ON c2.ContactName <= c1.ContactName
GROUP BY c1.ContactName
HAVING COUNT(*) BETWEEN 5 AND 10
ORDER BY c1.ContactName;
Note that in such instances, if there are ties in the ContactName values, then we would have to use a tie-breaker
column. In that case the correlation in the subquery would change like:
WHERE c2.ContactName <= c1.ContactName
OR ( c2.tie-breaker = c1.tie-breaker AND c2.ContactName <= c1.ContactName )
Another method is to encapsulate the subquery into a scalar UDF along the lines of:
CREATE FUNCTION dbo.rank ( @cName NVARCHAR(60) )
RETURNS INT AS BEGIN
RETURN( SELECT COUNT(*) FROM Customers
WHERE ContactName <= @cName )
END
With this UDF, the query can be as simple as:
SELECT *
FROM Customers
WHERE dbo.rank ( ContactName ) BETWEEN 5 AND 10 ;
The windowing functions cannot be used directly in the WHERE clause and so, to use them we will have to use a
derived table or a Common Table Expression like:
; WITH CTE ( CustomerID, OrderDate, rank ) AS (
SELECT CustomerID, OrderDate,
ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS "rank"
FROM Orders )
SELECT *
FROM CTE
WHERE rank BETWEEN 5 AND 10 ;
When stored procedures are used to handle such queries that facilitate ranking, input parameters for page size
and page number are often passed in. In such cases, the parameters can be used like:
CREATE PROCEDURE usp ( @page_nbr INT, @page_size INT )
AS
; WITH CTE ( CustomerID, OrderDate, rank ) AS (
SELECT CustomerID, OrderDate,
ROW_NUMBER() OVER ( ORDER BY OrderDate ) AS "rank"
FROM Orders )
SELECT CustomerID, OrderDate, rank
FROM CTE
WHERE rank > ( @page_nbr - 1 ) * @page_size
AND rank <= @page_nbr * @page_size ;
Regardless of which method is used, all the various factors should be carefully weighed in before deciding to
use a particular ranking or paging solution. Since each of the methods detailed above can be impacted by various
aspects of the physical model, make sure to test the solutions against appropriate datasets before making claims
regarding performance.
Some additional references
How to dynamically number rows in a SELECT Transact-SQL statement ( Microsoft Support )
How do I page through a recordset? ( Aaron Bertrand )
|