How to PIVOT in SQL
Pivot – This keyword in MS SQL Server is used to convert the rows to columns for a table and perform aggregations on a specific column. Please note, this keyword will not be able to remove/impact records from the table. It only interchanges the rows to columns in the output result set only. The original data in the table remains intact.
If you want to practice and learn more complex SQL check out Data Camp to practice SQL online.
More Info: This keyword works similar to “Pivot Table” in the Excel
Before we learn on how to operate/work with Pivot, let’s learn the scenarios where we need to change the rows into columns and aggregations in the first place. Please see the below screenshot of the data, we have company names and their respective departments in that company and the budget allocated for each of these departments.
This view is good if we have less number of records similar to the above, but if the records increase(if the list of companies are more) then this view will be quite challenging, the result set will grow vertically and the user will have discomfort scrolling down and down. Also, it is hard to compare one company department with another company department due to the vertical/hierarchy view.
Now this is where Pivot comes to the rescue, using pivot the “department_name” row values can be converted to columns, “budget_allocated” values can be spread across these columns.
Below is how the entire result set looks like
As you can see the result set is now wider and shorter(vertically), this gives the end user the advantage to compare the values efficiently with a quick glance
Syntax : From the MSDN Official Site
SELECT <non-pivoted column>, [first pivoted column] AS <column name>, FROM (<SELECT query that produces the data>) AS <alias for the source query> PIVOT ( <aggregation function>(<column being aggregated>) FOR [<column that contains the values that will become column headers>] IN ([first pivoted column] ) ) AS <alias for the pivot table>
Let’s try to understand the terminology first.
Pivoted column – This is the column which you can to convert the rows to columns, in the above data it will be “department_name”
(first pivoted column….last pivoted column) – Now once we try to convert the pivoted column rows to column, then each distinct row will be a column, example : “Finance”, “Tech”, “Marketing” are now the new columns which are created
Non- pivoted column – This is the column which you do not wish to convert rows to column, example “company_name”
Aggregation functions – count,avg,sum,max,min are the functions which can be applied on the values column. Example : “budget_allocated” column.
- The actual query starts with the non-pivoted columns; in this case, it is “company_name”
- Followed by the 1-nth pivoted column names, in this case, the “Finance”, “Tech” and “Marketing”, we are required to mention these, since these are the new columns in the result set
- We need to give the table name or subquery (Very Important: I will explain why subquery needs to be used in the next section) from which the data resides
- Then the “Pivot” function is used, in which we need to specify the column which are used to display the aggregate values of a column. In this case, it will be sum(budget_allocated). You may ask, why do we need the “sum” since all I need is the values to be displayed. Here without an aggregate method, we cannot display the values. Due to this we can use sum, and even with sum, we shall get the same result
“FOR” is a keyword with which we are required to specify from which column the rows are been converted to column. The respective new column name (distinct values of the column) as shown above. The best practice is to enclose the new column names in “” so that any spaces in the new column names can be escaped. Example : Tech and Ops or [Tech and Ops]
Very Important: Till now we had only 3 columns in the company_budget_allocations table, let’s add a new column to the table.
Now let’s run the previous pivot table query and see the results
As you can see due to the addition of the “id” column in the company_budget_allocations table, the entire pivot result set got impacted and the output looks distorted.
The reason for this is the “from” clause table. We provided the table name directly here, which essentially contains the “id” column as well.
The resolution for this is, instead of directly specifying the tablename after the “from” clause, use a sub query, which essentially only gets the required columns, non-pivot columns, pivoted columns and aggregate column. Please view the below updated query and results for comparison. Note: In your general select query as well, it is always recommended that you mention the columns from which the table’s data needs to extracted, since that will speed up the query execution and improves the performance of the overall select query. Try to avoid using “*” in the select to extract data.