Skip to content

Binder

Expand_grid : Create a dataframe from all combinations of inputs.

Background

This notebook serves to show examples of how expand_grid works. Expand_grid aims to offer similar functionality to R's expand_grid function.

Expand_grid creates a dataframe from a combination of all inputs.

One requirement is that a dictionary be provided. If a dataframe is provided, a key must be provided as well.

Some of the examples used here are from tidyr's expand_grid page and from Pandas' cookbook.

import numpy as np
import pandas as pd
from janitor import expand_grid
data = {"x": [1, 2, 3], "y": [1, 2]}

result = expand_grid(others=data)

result
x y
0 0
0 1 1
1 1 2
2 2 1
3 2 2
4 3 1
5 3 2
# combination of letters

data = {"l1": list("abcde"), "l2": list("ABCDE")}

letters = expand_grid(others=data)

letters.head(10)
l1 l2
0 0
0 a A
1 a B
2 a C
3 a D
4 a E
5 b A
6 b B
7 b C
8 b D
9 b E
data = {"height": [60, 70], "weight": [100, 140, 180], "sex": ["Male", "Female"]}

result = expand_grid(others=data)

result
height weight sex
0 0 0
0 60 100 Male
1 60 100 Female
2 60 140 Male
3 60 140 Female
4 60 180 Male
5 60 180 Female
6 70 100 Male
7 70 100 Female
8 70 140 Male
9 70 140 Female
10 70 180 Male
11 70 180 Female
# A dictionary of arrays
# Arrays can only have dimensions of 1 or 2

data = {"x1": np.array([[1, 3], [2, 4]]), "x2": np.array([[5, 7], [6, 8]])}

result = expand_grid(others=data)

result
x1 x2
0 1 0 1
0 1 3 5 7
1 1 3 6 8
2 2 4 5 7
3 2 4 6 8
# This shows how to method chain expand_grid
# to an existing dataframe

df = pd.DataFrame({"x": [1, 2], "y": [2, 1]})
data = {"z": [1, 2, 3]}

# a key has to be passed in for the dataframe
# this is added to the column name of the dataframe

result = df.expand_grid(df_key="df", others=data)

result
df z
x y 0
0 1 2 1
1 1 2 2
2 1 2 3
3 2 1 1
4 2 1 2
5 2 1 3
# expand_grid can work on multiple dataframes
# Ensure that there are keys
# for each dataframe in the dictionary

df1 = pd.DataFrame({"x": range(1, 3), "y": [2, 1]})
df2 = pd.DataFrame({"x": [1, 2, 3], "y": [3, 2, 1]})
df3 = pd.DataFrame({"x": [2, 3], "y": ["a", "b"]})

data = {"df1": df1, "df2": df2, "df3": df3}

result = expand_grid(others=data)

result
df1 df2 df3
x y x y x y
0 1 2 1 3 2 a
1 1 2 1 3 3 b
2 1 2 2 2 2 a
3 1 2 2 2 3 b
4 1 2 3 1 2 a
5 1 2 3 1 3 b
6 2 1 1 3 2 a
7 2 1 1 3 3 b
8 2 1 2 2 2 a
9 2 1 2 2 3 b
10 2 1 3 1 2 a
11 2 1 3 1 3 b

Columns can be flattened with pyjanitor's collapse_levels:

result.collapse_levels()
df1_x df1_y df2_x df2_y df3_x df3_y
0 1 2 1 3 2 a
1 1 2 1 3 3 b
2 1 2 2 2 2 a
3 1 2 2 2 3 b
4 1 2 3 1 2 a
5 1 2 3 1 3 b
6 2 1 1 3 2 a
7 2 1 1 3 3 b
8 2 1 2 2 2 a
9 2 1 2 2 3 b
10 2 1 3 1 2 a
11 2 1 3 1 3 b

Or a level dropped with Pandas' droplevel method:

letters.droplevel(level=-1, axis="columns").head(10)
l1 l2
0 a A
1 a B
2 a C
3 a D
4 a E
5 b A
6 b B
7 b C
8 b D
9 b E