Expand_grid : Create a dataframe from all combinations of inputs.
Background
This notebook serves to show examples of how expand_grid works. Expand_grid aims to offer similar functionality to R's expand_grid function.
Expand_grid creates a dataframe from a combination of all inputs.
One requirement is that a dictionary be provided. If a dataframe is provided, a key must be provided as well.
Some of the examples used here are from tidyr's expand_grid page and from Pandas' cookbook.
import numpy as np
import pandas as pd
from janitor import expand_grid
data = {"x": [1, 2, 3], "y": [1, 2]}
result = expand_grid(others=data)
result
# combination of letters
data = {"l1": list("abcde"), "l2": list("ABCDE")}
letters = expand_grid(others=data)
letters.head(10)
data = {"height": [60, 70], "weight": [100, 140, 180], "sex": ["Male", "Female"]}
result = expand_grid(others=data)
result
# A dictionary of arrays
# Arrays can only have dimensions of 1 or 2
data = {"x1": np.array([[1, 3], [2, 4]]), "x2": np.array([[5, 7], [6, 8]])}
result = expand_grid(others=data)
result
# This shows how to method chain expand_grid
# to an existing dataframe
df = pd.DataFrame({"x": [1, 2], "y": [2, 1]})
data = {"z": [1, 2, 3]}
# a key has to be passed in for the dataframe
# this is added to the column name of the dataframe
result = df.expand_grid(df_key="df", others=data)
result
# expand_grid can work on multiple dataframes
# Ensure that there are keys
# for each dataframe in the dictionary
df1 = pd.DataFrame({"x": range(1, 3), "y": [2, 1]})
df2 = pd.DataFrame({"x": [1, 2, 3], "y": [3, 2, 1]})
df3 = pd.DataFrame({"x": [2, 3], "y": ["a", "b"]})
data = {"df1": df1, "df2": df2, "df3": df3}
result = expand_grid(others=data)
result
Columns can be flattened with pyjanitor's collapse_levels
:
result.collapse_levels()
Or a level dropped with Pandas' droplevel
method:
letters.droplevel(level=-1, axis="columns").head(10)