Groupby_agg : Shortcut for assigning a groupby-transform to a new column.
Background
This notebook serves to show how to use the groupby_agg method from pyjanitor's general functions submodule.
The groupby_agg method allows us to add the result of an aggregation from a grouping, as a new column, back to the dataframe.
Currently in pandas, to append a column back to a dataframe, you do it in three steps: 1. Groupby a column or columns 2. Apply the transform method with an aggregate function on the grouping, and finally 3. Assign the result of the transform to a new column in the dataframe.
The groupby_agg allows you to replicate the same process in one step and with sensible arguments. The example below illustrates this function
# load modules
import numpy as np
import pandas as pd
from janitor import groupby_agg
data = {
"item": ["shoe", "shoe", "bag", "shoe", "bag"],
"MRP": [220, 450, 320, 200, 305],
"number_sold": [100, 40, 56, 38, 25],
}
df = pd.DataFrame(data)
df
Use grouby_agg to find average price for each item and append column to dataframe
df = df.groupby_agg(
by="item", agg="mean", agg_column_name="MRP", new_column_name="Avg_MRP"
)
df
Null cells are retained as well :
df = pd.DataFrame(
{
"name": ("black", "black", "black", "red", "red"),
"type": ("chair", "chair", "sofa", "sofa", "plate"),
"num": (4, 5, 12, 4, 3),
"nulls": (1, 1, np.nan, np.nan, 3),
}
)
df
filtered_df = df.groupby_agg(
by=["nulls"], agg="size", agg_column_name="type", new_column_name="counter"
)
filtered_df
The groupby_agg method can be extended for different purposes. One of these is groupwise filtering, where only groups that meet a condition are retained. Let's explore this with an example:
filtered_df = df.groupby_agg(
by=["name", "type"], agg="size", agg_column_name="type", new_column_name="counter"
).query("counter > 1")
filtered_df