Conditional join
Joining on Non-Equi Operators
import janitor
import pandas as pd
# https://stackoverflow.com/q/61948103/7175713
df1 = pd.DataFrame({"id": [1, 1, 1, 2, 2, 3], "value_1": [2, 5, 7, 1, 3, 4]})
df2 = pd.DataFrame(
{
"id": [1, 1, 1, 1, 2, 2, 2, 3],
"value_2A": [0, 3, 7, 12, 0, 2, 3, 1],
"value_2B": [1, 5, 9, 15, 1, 4, 6, 3],
}
)
df1
df2
Join on equi and non-equi operators is possible:
df1.conditional_join(
df2,
("id", "id", "=="),
("value_1", "value_2A", ">="),
("value_1", "value_2B", "<="),
sort_by_appearance=True,
)
The default join is inner. left and right joins are supported as well:
df1.conditional_join(
df2,
("id", "id", "=="),
("value_1", "value_2A", ">="),
("value_1", "value_2B", "<="),
how="left",
sort_by_appearance=True,
)
df1.conditional_join(
df2,
("id", "id", "=="),
("value_1", "value_2A", ">="),
("value_1", "value_2B", "<="),
how="right",
sort_by_appearance=True,
)
Join on just the non-equi joins is also possible:
df1.conditional_join(
df2,
("value_1", "value_2A", ">"),
("value_1", "value_2B", "<"),
how="inner",
sort_by_appearance=True,
)
Join on just equality is also possible, but should be avoided, as Pandas' merge/join is more efficient (it uses Pandas' internal merge functions anyway);
df1.conditional_join(df2, ("id", "id", "=="))
Join on not equal -> !=
df1.conditional_join(df2, ("id", "id", "!="))
If the columns from both dataframes have nothing in common, a single indexed column is returned:
(
df1.select_columns("value_1").conditional_join(
df2.select_columns("val*"),
("value_1", "value_2A", ">"),
("value_1", "value_2B", "<"),
)
)