Expand_grid : Create a dataframe from all combinations of inputs.

Background

This notebook serves to show examples of how expand_grid works. Expand_grid aims to offer similar functionality to R's expand_grid function.

Expand_grid creates a dataframe from a combination of all inputs.

One requirement is that a dictionary be provided. If a dataframe is provided, a key must be provided as well.

Some of the examples used here are from tidyr's expand_grid page and from Pandas' cookbook.

import numpy as np
import pandas as pd
from janitor import expand_grid

data = {"x": [1, 2, 3], "y": [1, 2]}

result = expand_grid(others=data)

result

	x	y
	0	0
0	1	1
1	1	2
2	2	1
3	2	2
4	3	1
5	3	2

# combination of letters

data = {"l1": list("abcde"), "l2": list("ABCDE")}

letters = expand_grid(others=data)

letters.head(10)

	l1	l2
	0	0
0	a	A
1	a	B
2	a	C
3	a	D
4	a	E
5	b	A
6	b	B
7	b	C
8	b	D
9	b	E

data = {"height": [60, 70], "weight": [100, 140, 180], "sex": ["Male", "Female"]}

result = expand_grid(others=data)

result

	height	weight	sex
	0	0	0
0	60	100	Male
1	60	100	Female
2	60	140	Male
3	60	140	Female
4	60	180	Male
5	60	180	Female
6	70	100	Male
7	70	100	Female
8	70	140	Male
9	70	140	Female
10	70	180	Male
11	70	180	Female

# A dictionary of arrays
# Arrays can only have dimensions of 1 or 2

data = {"x1": np.array([[1, 3], [2, 4]]), "x2": np.array([[5, 7], [6, 8]])}

result = expand_grid(others=data)

result

	x1		x2
	0	1	0	1
0	1	3	5	7
1	1	3	6	8
2	2	4	5	7
3	2	4	6	8

# This shows how to method chain expand_grid
# to an existing dataframe

df = pd.DataFrame({"x": [1, 2], "y": [2, 1]})
data = {"z": [1, 2, 3]}

# a key has to be passed in for the dataframe
# this is added to the column name of the dataframe

result = df.expand_grid(df_key="df", others=data)

result

	df		z
	x	y	0
0	1	2	1
1	1	2	2
2	1	2	3
3	2	1	1
4	2	1	2
5	2	1	3

# expand_grid can work on multiple dataframes
# Ensure that there are keys
# for each dataframe in the dictionary

df1 = pd.DataFrame({"x": range(1, 3), "y": [2, 1]})
df2 = pd.DataFrame({"x": [1, 2, 3], "y": [3, 2, 1]})
df3 = pd.DataFrame({"x": [2, 3], "y": ["a", "b"]})

data = {"df1": df1, "df2": df2, "df3": df3}

result = expand_grid(others=data)

result

	df1		df2		df3
	x	y	x	y	x	y
0	1	2	1	3	2	a
1	1	2	1	3	3	b
2	1	2	2	2	2	a
3	1	2	2	2	3	b
4	1	2	3	1	2	a
5	1	2	3	1	3	b
6	2	1	1	3	2	a
7	2	1	1	3	3	b
8	2	1	2	2	2	a
9	2	1	2	2	3	b
10	2	1	3	1	2	a
11	2	1	3	1	3	b

Columns can be flattened with pyjanitor's collapse_levels:

result.collapse_levels()

	df1_x	df1_y	df2_x	df2_y	df3_x	df3_y
0	1	2	1	3	2	a
1	1	2	1	3	3	b
2	1	2	2	2	2	a
3	1	2	2	2	3	b
4	1	2	3	1	2	a
5	1	2	3	1	3	b
6	2	1	1	3	2	a
7	2	1	1	3	3	b
8	2	1	2	2	2	a
9	2	1	2	2	3	b
10	2	1	3	1	2	a
11	2	1	3	1	3	b

Or a level dropped with Pandas' droplevel method:

letters.droplevel(level=-1, axis="columns").head(10)

	l1	l2
0	a	A
1	a	B
2	a	C
3	a	D
4	a	E
5	b	A
6	b	B
7	b	C
8	b	D
9	b	E

	df1		df2		df3
	x	y	x	y	x	y
0	1	2	1	3	2	a
1	1	2	1	3	3	b
2	1	2	2	2	2	a
3	1	2	2	2	3	b
4	1	2	3	1	2	a
5	1	2	3	1	3	b
6	2	1	1	3	2	a
7	2	1	1	3	3	b
8	2	1	2	2	2	a
9	2	1	2	2	3	b
10	2	1	3	1	2	a
11	2	1	3	1	3	b

	df1_x	df1_y	df2_x	df2_y	df3_x	df3_y
0	1	2	1	3	2	a
1	1	2	1	3	3	b
2	1	2	2	2	2	a
3	1	2	2	2	3	b
4	1	2	3	1	2	a
5	1	2	3	1	3	b
6	2	1	1	3	2	a
7	2	1	1	3	3	b
8	2	1	2	2	2	a
9	2	1	2	2	3	b
10	2	1	3	1	2	a
11	2	1	3	1	3	b

	df1		df2		df3
	x	y	x	y	x	y
0	1	2	1	3	2	a
1	1	2	1	3	3	b
2	1	2	2	2	2	a
3	1	2	2	2	3	b
4	1	2	3	1	2	a
5	1	2	3	1	3	b
6	2	1	1	3	2	a
7	2	1	1	3	3	b
8	2	1	2	2	2	a
9	2	1	2	2	3	b
10	2	1	3	1	2	a
11	2	1	3	1	3	b

	df1_x	df1_y	df2_x	df2_y	df3_x	df3_y
0	1	2	1	3	2	a
1	1	2	1	3	3	b
2	1	2	2	2	2	a
3	1	2	2	2	3	b
4	1	2	3	1	2	a
5	1	2	3	1	3	b
6	2	1	1	3	2	a
7	2	1	1	3	3	b
8	2	1	2	2	2	a
9	2	1	2	2	3	b
10	2	1	3	1	2	a
11	2	1	3	1	3	b

	df1		df2		df3
	x	y	x	y	x	y
0	1	2	1	3	2	a
1	1	2	1	3	3	b
2	1	2	2	2	2	a
3	1	2	2	2	3	b
4	1	2	3	1	2	a
5	1	2	3	1	3	b
6	2	1	1	3	2	a
7	2	1	1	3	3	b
8	2	1	2	2	2	a
9	2	1	2	2	3	b
10	2	1	3	1	2	a
11	2	1	3	1	3	b

	df1_x	df1_y	df2_x	df2_y	df3_x	df3_y
0	1	2	1	3	2	a
1	1	2	1	3	3	b
2	1	2	2	2	2	a
3	1	2	2	2	3	b
4	1	2	3	1	2	a
5	1	2	3	1	3	b
6	2	1	1	3	2	a
7	2	1	1	3	3	b
8	2	1	2	2	2	a
9	2	1	2	2	3	b
10	2	1	3	1	2	a
11	2	1	3	1	3	b