Expose explicitly missing values with complete
import janitor
import numpy as np
import pandas as pd
# from http://imachordata.com/2016/02/05/you-complete-me/
df = pd.DataFrame(
{
"Year": [1999, 2000, 2004, 1999, 2004],
"Taxon": [
"Saccharina",
"Saccharina",
"Saccharina",
"Agarum",
"Agarum",
],
"Abundance": [4, 5, 2, 1, 8],
}
)
df
Note that Year 2000 and Agarum pairing is missing in the DataFrame above. Let’s make it explicit:
df.complete("Year", "Taxon")
# A better viewing based on order
df.complete("Year", "Taxon", sort=True)
What if we wanted the explicit missing values for all the years from 1999 to 2004? Easy - simply pass a dictionary pairing the column name with the new values:
new_year_values = {"Year": range(df.Year.min(), df.Year.max() + 1)}
df.complete(new_year_values, "Taxon")
You can pass a callable as values in the dictionary:
def new_year_values(year):
return range(year.min(), year.max() + 1)
df.complete({"Year": new_year_values}, "Taxon", sort=True)
You can get explcit rows, based only on existing data:
# https://stackoverflow.com/q/62266057/7175713
df = {
"Name": ("Bob", "Bob", "Emma"),
"Age": (23, 23, 78),
"Gender": ("Male", "Male", "Female"),
"Item": ("house", "car", "house"),
"Value": (5, 1, 3),
}
df = pd.DataFrame(df)
df
In the DataFrame above, there is no car
Item value for the Name
, Age
, Gender
combination -> (Emma, 78, Female)
. Pass (Name, Age, Gender)
and Item
to explicitly expose the missing row:
df.complete(("Name", "Age", "Gender"), "Item")
The example above showed how to expose missing rows on a group basis. There is also the option of exposing missing rows with the by
parameter:
df = pd.DataFrame(
{
"state": ["CA", "CA", "HI", "HI", "HI", "NY", "NY"],
"year": [2010, 2013, 2010, 2012, 2016, 2009, 2013],
"value": [1, 3, 1, 2, 3, 2, 5],
}
)
df
Let's expose all the missing years, based on the minimum and maximum year, for each state:
result = df.complete({"year": new_year_values}, by="state", sort=True)
result
You can fill the nulls with Pandas' fillna
:
result.fillna(0, downcast="infer")