Calculate Percent in Pandas Pivot Table
Pandas is a powerful Python library for data manipulation and analysis, and one of its most useful features is the ability to create pivot tables. Pivot tables can help you summarize and analyze large datasets quickly and efficiently, and Pandas makes it easy to create them using the pivot_table()
function. You can check out the Pandas Pivot Tutorial if you haven’t already
One common task when working with pivot tables is to calculate percentages. In this blog post, we’ll look at a few examples of how to do that using Pandas.
Lets create some data for us to then pivot
import pandas as pd
data = {'salesperson': ['Alice', 'Bob', 'Charlie', 'Alice', 'Charlie', 'Bob'],
'product': ['A', 'B', 'C', 'A', 'B', 'C'],
'sales_amount': [100, 200, 300, 150, 250, 200]}
df = pd.DataFrame(data)
Use Lambda to Create Percentage in Python Pandas Pivot Tables
We use a lambda function to calculate the percentage of column for each dimension in the row. The lambda function takes the values in the pivot table as input and applies the calculation that we can insert into aggfunc section of the pivot table. Here is an example of a lambda function lambda x: sum(x) / sum(df['column']) * 100)
to them.
Suppose you have a dataset of sales data, with columns for the salesperson’s name, the product sold, and the sales amount. You want to create a pivot table that shows the percentage of total sales for each salesperson. Here’s how to do it:
Percent of Grand Total in a Pandas Pivot Table
pivot_table = pd.pivot_table(df,
values='sales_amount',
index='salesperson',
columns='product',
fill_value=0, aggfunc=lambda x: round(sum(x)/sum(df['sales_amount']) * 100, 2))
pivot_table
Note that we are using the fill_value parameter to add zeros to avoid the NA values.
Percent of Column Total in a Pandas Pivot Table
To get the percent of the column total we can use the same lambda function. However, we will need to apply it to the function to the columns. We can do this using the apply function. This needs to be applied after the table has been created. Use the pandas data frame from above to practice.
pd.pivot_table(data=df, index='Salesperson',
columns='Product',
values='Sales',
aggfunc=sum,
fill_value=0).apply(lambda x: x*100/sum(x))
Percent of Row Total in a Pandas Pivot Table
Now we will need to take a few extra steps to achieve this. We will create the pandas pivot table as we did in the first section. Once the table has been completed, we can now use the div() function. The div function will allow us specify how to divide the values in one DataFrame by the values in another DataFrame or a scalar value. The div()
function takes several parameters, including other
, fill_value
, level
, and axis
. Since we can specify the axis we do take the percent of the row total. However, we need the row total which we can get with the margins function in our pivot table. Lets see how all this works together
table = pd.pivot_table(data=df, index='Salesperson',
columns='Product',
values='Sales',
aggfunc=sum,
fill_value=0,
margins=True)
table.div(table.iloc[:,-1], axis=0 )
Using margins = True parameter will give us the total sum of the columns in a new column and specifying thata column using iloc[:,-1] within the divide function to get the percent of row total.