Processing Bird Call Data
Background
The following example was obtained by translating the R code from TidyTuesday 2019-04-30 to Python using Pandas and PyJanitor. It provides a simple example of using pyjanitor for: - column renaming - column name cleaning - dataframe merging
The data originates from a study of the effects of articifial light on bird behaviour. It is a subset of the original study for the Chicago area.
Citations
This data set originates from the publication:
Winger BM, Weeks BC, Farnsworth A, Jones AW, Hennen M, Willard DE (2019) Nocturnal flight-calling behaviour predicts vulnerability to artificial light in migratory birds. Proceedings of the Royal Society B 286(1900): 20190364. https://doi.org/10.1098/rspb.2019.0364
To reference only the data, please cite the Dryad data package:
Winger BM, Weeks BC, Farnsworth A, Jones AW, Hennen M, Willard DE (2019) Data from: Nocturnal flight-calling behaviour predicts vulnerability to artificial light in migratory birds. Dryad Digital Repository. https://doi.org/10.5061/dryad.8rr0498
import janitor
import pandas as pd
Get Raw Data
Using pandas to import csv data.
raw_birds = pd.read_csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/Chicago_collision_data.csv"
)
raw_call = pd.read_csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/bird_call.csv",
sep=" ",
)
raw_light = pd.read_csv(
"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/raw/Light_levels_dryad.csv"
)
Original DataFrames
Taking a quick look at the three imported (raw) pandas dataframes.
raw_birds.head()
raw_call.head()
raw_light.head()
Cleaning Data Using Pyjanitor
Pyjanitor provides additional method calls to standard pandas dataframe objects. The clean_names() method is one example which removes whitespace and lowercases all column names.
clean_light = raw_light.clean_names()
clean_light.head()
Pyjanitor champions the cleaning process using the call chaining approach. We use this here to provide multiple column renaming. As our dataframes have inconsistent column names we rename the columns in the raw_call dataframe.
clean_call = raw_call.rename_column(
"Species", "Genus"
).rename_column( # rename 'Species' column to 'Genus'
"Family", "Species"
) # rename 'Family' columnto 'Species'
clean_call.head()
We can chain as many standard pandas commands as we like, along with any pyjanitor specific methods.
clean_birds = (
raw_birds.merge(
clean_call, how="left"
) # merge the raw_birds dataframe with clean_raw dataframe
.select_columns(
[
"Genus",
"Species",
"Date",
"Locality",
"Collisions",
"Call",
"Habitat",
"Stratum",
]
) # include list of cols
.clean_names()
.rename_column(
"collisions", "family"
) # rename 'collisions' column to 'family' in merged dataframe
.rename_column("call", "flight_call")
.dropna() # drop all rows which contain a NaN
)
clean_birds.head()