Reformat Department Table | LeetCode | MSSQL

ranggakd

Retiago Drago

Posted on June 7, 2023

Reformat Department Table | LeetCode | MSSQL

The Problem

Let's first look at the problem. We have a Department table structured as follows:

Column Name Type
id int
revenue int
month varchar

(id, month) is the primary key for this table. The table contains the revenue of each department per month, and the month column values are in the form of ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"].

The task at hand is to write an SQL query to reformat the table such that there is a department id column and a revenue column for each month. We will return the result table in any order. The expected query result format is shown in the following example.

Input:

id revenue month
1 8000 Jan
2 9000 Jan
3 10000 Feb
1 7000 Feb
1 6000 Mar

Output:

id Jan_Revenue Feb_Revenue Mar_Revenue ... Dec_Revenue
1 8000 7000 6000 ... null
2 9000 null null ... null
3 null 10000 null ... null

The revenue from April to December is null in this example. The result table will have 13 columns: 1 for the department id and 12 for the months.

The Solution

We are going to explore four SQL solutions to this problem, each with a distinct approach. We will highlight their main differences, strengths and weaknesses, and structures.

Source Code 1

The first approach is the most straightforward one, using the PIVOT function to transform the data from rows into columns:

SELECT 
    id, 
    [Jan] AS Jan_Revenue, 
    [Feb] AS Feb_Revenue, 
    [Mar] AS Mar_Revenue, 
    [Apr] AS Apr_Revenue, 
    [May] AS May_Revenue, 
    [Jun] AS Jun_Revenue, 
    [Jul] AS Jul_Revenue, 
    [Aug] AS Aug_Revenue, 
    [Sep] AS Sep_Revenue, 
    [Oct] AS Oct_Revenue, 
    [Nov] AS Nov_Revenue, 
    [Dec] AS Dec_Revenue
FROM 
    (
        SELECT id, month, revenue 
        FROM Department
    ) AS SourceTable
PIVOT
(
    SUM(revenue)
    FOR month IN ([Jan], [Feb], [Mar], [Apr], [May], [Jun], [Jul], [Aug], [Sep], [Oct], [Nov], [Dec])
) AS PivotTable
Enter fullscreen mode Exit fullscreen mode

This approach resulted in a runtime of 1443ms, beating 79.74% of other solutions.

s1

Source Code 2

The second approach is similar to the first one but simplifies the SQL syntax. Instead of using an alias for the source table, we directly use the Department table:

SELECT 
    id, 
    [Jan] [Jan_Revenue], 
    [Feb] [Feb_Revenue], 
    [Mar] [Mar_Revenue], 
    [Apr] [Apr_Revenue], 
    [May] [May_Revenue], 
    [Jun] [Jun_Revenue], 
    [Jul] [Jul_Revenue], 
    [Aug] [Aug_Revenue], 
    [Sep] [Sep_Revenue], 
    [Oct] [Oct_Revenue], 
    [Nov] [Nov_Revenue], 
    [Dec] [Dec_Revenue]
FROM Department
PIVOT
(
    SUM(revenue)
    FOR month IN ([Jan], [Feb], [Mar], [Apr], [May], [Jun], [Jul], [Aug], [Sep], [Oct], [Nov], [Dec])
) [PivotTable]
Enter fullscreen mode Exit fullscreen mode

This change results in a slightly faster runtime of 1390ms, beating 89.9% of other solutions.

s2

Source Code 3

The third approach takes a different path, eschewing the PIVOT function in favor of a CASE expression inside a SUM function:

SELECT
    id,
    SUM(CASE WHEN month = 'Jan' THEN revenue ELSE NULL END) [Jan_Revenue],
    SUM(CASE WHEN month = 'Feb' THEN revenue ELSE NULL END) [Feb_Revenue],
    SUM(CASE WHEN month = 'Mar' THEN revenue ELSE NULL END) [Mar_Revenue],
    SUM(CASE WHEN month = 'Apr' THEN revenue ELSE NULL END) [Apr_Revenue],
    SUM(CASE WHEN month = 'May' THEN revenue ELSE NULL END) [May_Revenue],
    SUM(CASE WHEN month = 'Jun' THEN revenue ELSE NULL END) [Jun_Revenue],
    SUM(CASE WHEN month = 'Jul' THEN revenue ELSE NULL END) [Jul_Revenue],
    SUM(CASE WHEN month = 'Aug' THEN revenue ELSE NULL END) [Aug_Revenue],
    SUM(CASE WHEN month = 'Sep' THEN revenue ELSE NULL END) [Sep_Revenue],
    SUM(CASE WHEN month = 'Oct' THEN revenue ELSE NULL END) [Oct_Revenue],
    SUM(CASE WHEN month = 'Nov' THEN revenue ELSE NULL END) [Nov_Revenue],
    SUM(CASE WHEN month = 'Dec' THEN revenue ELSE NULL END) [Dec_Revenue]
FROM Department
GROUP BY id
Enter fullscreen mode Exit fullscreen mode

Although this approach is more readable and easier to understand, the execution is significantly slower, with a runtime of 3372ms and beating only 5.20% of other solutions.

s3

Source Code 4

The final approach utilizes window functions, specifically the OVER clause with PARTITION BY. This allows us to perform the calculation within each department's context:

SELECT DISTINCT 
    id,
    SUM(CASE WHEN month = 'Jan' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Jan_Revenue],
    SUM(CASE WHEN month = 'Feb' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Feb_Revenue],
    SUM(CASE WHEN month = 'Mar' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Mar_Revenue],
    SUM(CASE WHEN month = 'Apr' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Apr_Revenue],
    SUM(CASE WHEN month = 'May' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [May_Revenue],
    SUM(CASE WHEN month = 'Jun' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Jun_Revenue],
    SUM(CASE WHEN month = 'Jul' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Jul_Revenue],
    SUM(CASE WHEN month = 'Aug' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Aug_Revenue],
    SUM(CASE WHEN month = 'Sep' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Sep_Revenue],
    SUM(CASE WHEN month = 'Oct' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Oct_Revenue],
    SUM(CASE WHEN month = 'Nov' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Nov_Revenue],
    SUM(CASE WHEN month = 'Dec' THEN revenue ELSE NULL END) OVER (PARTITION BY id) [Dec_Revenue]
FROM Department
Enter fullscreen mode Exit fullscreen mode

The window function approach has a runtime of 2198ms, beating 16.89% of other submissions.

s4

Conclusion

All the presented solutions solve the problem, but they differ in efficiency and readability.

The second approach seems to perform the best, beating 89.9% of other solutions. The first approach follows closely, with a performance beating 79.74% of submissions. The fourth approach, using window functions, outperforms only 16.89% of the submissions, and the third approach is the least efficient, outperforming only 5.2%.

However, keep in mind that these efficiency ratings come from LeetCode and might not always reflect performance in a real-world RDBMS setting. For instance, the third approach might perform better in certain database systems that optimize CASE expressions well.

Ultimately, the choice between these methods depends on your specific use case, system, and personal preferences for readability and maintainability.

You can find the original problem at LeetCode.

For more insightful solutions and tech-related content, feel free to connect with me on my Beacons page.

ranggakd - Link in Bio & Creator Tools | Beacons

@ranggakd | center details summary summary Oh hello there I m a an Programmer AI Tech Writer Data Practitioner Statistics Math Addict Open Source Contributor Quantum Computing Enthusiast details center.

favicon beacons.ai
💖 💪 🙅 🚩
ranggakd
Retiago Drago

Posted on June 7, 2023

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related