Have you ever been in a situation where you are selecting records from a database that need to be ranked, but the column(s) you’re attempting to ORDER BY are not unique. For example, using the Orders table below, how could you display a distinct list of customers and their last order? The problem is that the date field contains only the date, but not the time. Therefore, it’s possible that two orders can be placed by the same customer on the same day. Now if we had a field defined as AUTO_INCREMENT in MySQL or IDENTITY in Microsoft SQL Server and the records were entered sequentially, this would be a simple task.

Orders Table

Example:
CustomerOrderDateAmount
Jane2011-01-0512
Jane2011-01-0715
Jane2011-01-0717
John2011-01-0111
John2011-01-0227
John2011-01-0213
Pat2011-02-055
Pat2011-02-0734
Pat2011-02-0712

This can be solved in MS SQL , and with a little more code in MySQL , as well. In MS SQL Server 2005+, uniquely identifying the above records is a breeze using the ROW_NUMBER() function. If your not familiar with ROW_NUMBER(), MSDN defines theT-SQL ROW_NUMBER()as“Returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.”

So, in order to display a distinct list of customers and uniquely identify their last order, we could write something like:

SELECT  ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY OrderDate DESC) AS RowNumber
       ,Customer
       ,OrderDate
       ,Amount
  FROM Orders
Results:
RowNumberCustomerOrderDateAmount
1Jane2011-01-0715
2Jane2011-01-0717
3Jane2011-01-0512
1John2011-01-0227
2John2011-01-0213
3John2011-01-0111
1Pat2011-02-0734
2Pat2011-02-0712
3Pat2011-02-055

Notice how a unique row number is now apparent on each row within the partition. The next step would be to encompasses the statement in a sub query or Common Table Expression (CTE), and filter out the unwanted records based on the generated row number.

SELECT  Customer
       ,OrderDate
       ,Amount
  FROM
      (
        SELECT  ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY OrderDate DESC) AS RowNumber
               ,Customer
               ,OrderDate
               ,Amount
          FROM Orders
      ) subquery WHERE RowNumber = 1
Results:
CustomerOrderDateAmount
Jane2011-01-0715
John2011-01-0227
Pat2011-02-0734

The result is a single record for each customer, even when the customer has more than one order on the same day.


MySQL Implementation

Recently, I ran into a similar situation on a WordPress implementation, but with Horses not Orders/Customers. The requirement was to display each horse’s last workout and next race. MySQL does not have a ROW_NUMBER() function. However, MySQL does allow for inline assignment of variables and the ability to reassign and reference those variables as the query is working its way through the execution. This allows the same functionality that ROW_NUMBER() provides to be achieved in MySQL. Sticking to the same example used above the MySQL solution would be.

SELECT  @row_num := IF(@prev_value=o.Customer,@row_num+1,1) AS RowNumber
       ,o.Customer
       ,o.OrderDate
       ,o.Amount
       ,@prev_value := o.Customer
  FROM Orders o,
      (SELECT @row_num := 1) x,
      (SELECT @prev_value := '') y
  ORDER BY o.Customer, o.OrderDate DESC
Results:
RowNumberCustomerOrderDateAmount
1Jane2011-01-0715
2Jane2011-01-0717
3Jane2011-01-0512
1John2011-01-0227
2John2011-01-0213
3John2011-01-0111
1Pat2011-02-0734
2Pat2011-02-0712
3Pat2011-02-055

A unique row number is now apparent on each row with in the partition. The @row_num variable holds the current row number and the @pev_value variable holds the current value of the partition by field. The variables are defined and assigned a default value with in subqueries. The Orders table and the subqueries are then combined in a single select statement. The @row_num variable is incremented by one until the @prev_value does not equal the Customer and is then reset back to one.

Important

  • The @row_num variable must be set before the @prev_value variable
  • The first field in the ORDER BY must be the field that you are partitioning by
  • The default value assigned to the @prev_value variable must not exist in the partition by field

As we did with the MS SQL ROW_NUMBER() example, we will need to encompass the statement in a sub query in order to filter based on the generated row number.

SELECT  Customer
       ,OrderDate
       ,Amount
  FROM
     (
      SELECT  @row_num := IF(@prev_value=o.Customer,@row_num+1,1) AS RowNumber
             ,o.Customer 
             ,o.OrderDate
             ,o.Amount
             ,@prev_value := o.Customer
        FROM Orders o,
             (SELECT @row_num := 1) x,
             (SELECT @prev_value := '') y
       ORDER BY o.Customer, o.OrderDate DESC
     ) subquery
 WHERE RowNumber = 1
Results:
CustomerOrderDateAmount
Jane2011-01-0715
John2011-01-0227
Pat2011-02-0734

The result is a single record for each customer, even when the customer had more than one order on the same day.

The best solution is to avoid being in this situation in the first place. However an existing data schema does not always lend itself to new requirements.

4 Comments

20120523
    •  11:41 am
    • Yevsey
    •  

    I have tried something similar to this. however, the issue i ran into is when i run the query in mySQL workbench, the results come out fine. if i call the stored procedure in VB, the row number always shows 1.

    -- -----------------------------------------------------
    -- procedure get_issue_data
    -- -----------------------------------------------------
     
    DELIMITER $$
    USE `unapproved`$$
    CREATE DEFINER=`root`@`localhost` PROCEDURE `get_issue_data`(IN startdate DATE,IN endDate DATE,branches VARCHAR(2000),top INT,bh bit)
        DETERMINISTIC
     
     
    BEGIN
     
    DECLARE seperator_count INT;
    DECLARE id_string VARCHAR(10);
    DECLARE test_if_done_string VARCHAR(1000);
    DECLARE a_id_string VARCHAR(10) ;
     
    DROP TABLE IF EXISTS tmpbranche;
     
    CREATE TABLE tmpbranche (pid INT);
    CREATE INDEX ix_pid ON tmpbranche(pid);
     
     
     
     
    IF ISNull(branches)= FALSE THEN
     
            SET seperator_count=1;
     
            -- This loop populates table with branch list
            parameter_loop: LOOP
     
                SET id_string = 
                                (SELECT TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(branches, ',', seperator_count), ',', -1)));
     
                INSERT INTO tmpbranche (pid) VALUES (id_string);
     
                SET test_if_done_string = 
                                (SELECT TRIM(SUBSTRING_INDEX(SUBSTRING_INDEX(branches, ',', seperator_count), ',', -seperator_count)));
     
                IF branches = test_if_done_string
                    THEN LEAVE parameter_loop;
                END IF;
     
                SET seperator_count = seperator_count + 1;
     
            END LOOP parameter_loop;
    END IF;
     
     
     
    IF  bh=0  OR IsNUll(bh)=TRUE THEN
        DROP TABLE IF EXISTS results;
     
        CREATE TEMPORARY TABLE results(end_date DATE,branch INT,issue_type VARCHAR(45),volume INT,ROW_NUMBER INT,dummy VARCHAR(45));
     
     
     
     
                INSERT INTO results(end_date,branch,issue_type,volume,ROW_NUMBER,dummy) 
     
                    SELECT
                       T.end_date,
                       T.branch,
                       T.issue_type,
                       T.volume,
                       @num := IF(@TYPE = end_date, @num + 1, 1) AS ROW_NUMBER,
                       @TYPE := end_date AS dummy
                    FROM  (SELECT tblPeriods.end_date,
                                                branch, 
                                                tblissueTypes.issue_type,
                                                COUNT(tblIssues.tblissueTypes_id) AS volume
                                            FROM tblInvoices, tblIssues,tblIssueTypes,tblPeriods
                                            WHERE tblIssues.tblInvoices_id=tblInvoices.id 
                                            AND tblIssueTypes.id=tblIssues.tblissueTypes_id AND 
                                            tblInvoices.branch IN (SELECT pid FROM tmpbranche) AND 
                                            (tblPeriods.start_date BETWEEN startdate AND enddate OR tblPeriods.end_date BETWEEN startdate AND enddate) AND
                                            tblinvoices.import_date BETWEEN tblPeriods.start_date AND tblPeriods.end_date AND tblIssues.tblIssueTypes_id NOT IN ('20','7')
                                            GROUP BY branch,tblPeriods.end_date,tblissueTypes.issue_type
                                            ORDER BY branch,tblPeriods.end_date,COUNT(tblIssues.tblissueTypes_id) DESC) T; 
     
     
    ELSE
        DROP TABLE IF EXISTS results;
     
        CREATE TEMPORARY TABLE results(end_date DATE,issue_type VARCHAR(45),volume INT,ROW_NUMBER INT,dummy VARCHAR(45));
     
        SET @num  = 1;
        INSERT INTO results(end_date,issue_type,volume,ROW_NUMBER,dummy) 
            SELECT
                T.end_date,
                T.issue_type,
                T.volume,
                @num := IF(@TYPE = end_date, @num + 1, 1) AS ROW_NUMBER,
                @TYPE := end_date AS dummy
            FROM  (SELECT tblPeriods.end_date,
                    tblissueTypes.issue_type,
                    COUNT(tblIssues.tblissueTypes_id) AS volume
                    FROM tblInvoices, tblIssues,tblIssueTypes,tblPeriods
                    WHERE tblIssues.tblInvoices_id=tblInvoices.id 
                    AND tblIssueTypes.id=tblIssues.tblissueTypes_id AND 
                    tblInvoices.branch IN (SELECT pid FROM tmpbranche) AND 
                    (tblPeriods.start_date BETWEEN startdate AND enddate OR tblPeriods.end_date BETWEEN startdate AND enddate) AND
                    tblinvoices.import_date BETWEEN tblPeriods.start_date AND tblPeriods.end_date AND tblIssues.tblIssueTypes_id NOT IN ('20','7')
                    GROUP BY tblPeriods.end_date,tblissueTypes.issue_type
                    ORDER BY tblPeriods.end_date,COUNT(tblIssues.tblissueTypes_id) DESC) T; 
    END IF;
     
     IF top=0  OR IsNUll(top)=TRUE THEN 
        SELECT * FROM results;
     ELSE
        SELECT * FROM results WHERE row_number<=top;
     END IF;
     
     DROP TABLE IF EXISTS tmpbranche;
     DROP TABLE IF EXISTS results;
     
    END
    
    Sub t()
    Dim objRec As New ADODB.Recordset
    Dim objCon As New ADODB.Connection
    Dim objSQL As New basMySQL
    Dim objCMD As New ADODB.Command
    
            objCon.ConnectionString = objSQL.ConnectionString
            objCon.Open
    
            Set objCMD = New ADODB.Command
    
            With objCMD
                .ActiveConnection = objCon
                .CommandText = "get_issue_data"
                .CommandType = adCmdStoredProc
                .Parameters.Append .CreateParameter("start_date", adDate, adParamInput, , Format(DateAdd("m", -12, Now()), "yyyy-mm-dd"))
                .Parameters.Append .CreateParameter("end_date", adDate, adParamInput, , Format(Now(), "yyyy-mm-dd"))
    
                If strBranches = vbNullString Or strBranches = "All" Then
                    .Parameters.Append .CreateParameter("branches", adVarChar, adParamInput, 2000, "2403,2406,2440,2441,2442,2446,2452")
                Else
                    .Parameters.Append .CreateParameter("branches", adVarChar, adParamInput, 2000, Replace(strBranches, "All,", ""))
                End If
    
                .Parameters.Append .CreateParameter("top", adInteger, adParamInput, , 5)
                .Parameters.Append .CreateParameter("bh", adBoolean, adParamInput, , 1)
    
            End With
    
            Set objRec = objCMD.Execute
    
            With objRec
                Do Until .EOF = True
                    Debug.Print .Fields(3).Value & " | " & .Fields(2).Value
    
                    .MoveNext
                Loop
            End With
    
    End Sub
    
    #44
20120226
  • I was able to get the same query to work with in a MySql Stored Procedure. How are you creating your stored procedure?

    here is the little example I wrote to test

    DELIMITER //
    CREATE PROCEDURE proctest()
    BEGIN
    SELECT col1
    ,col2
    ,col3
    FROM
    (
    SELECT @row_num := IF(@prev_value=o.col1,@row_num+1,1) AS RowNumber
    ,o.col1
    ,o.col2
    ,o.col3
    ,@prev_value := o.col1
    FROM table1 o,
    (SELECT @row_num := 1) x,
    (SELECT @prev_value := ”) y
    ORDER BY o.col1, o.col3 DESC
    ) subquery
    WHERE RowNumber = 1;
    END //
    DELIMITER ;

    #43
20120225
    •  6:43 pm
    • Sydney
    •  

    This is great, but I am trying to do something similar in a
    stored procedure and I get the dreaded “You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘:=” message.

    How can we do something similar in a stored procedure or from an SqlCommand object?

    #42
20120125
    •  2:17 pm
    • Giselle
    •  

    Thanks!!! Me ayudo mucho el tutorial :)

    #41

Leave a Reply

*
*