import janitor
import pandas as pd
import seaborn as sns

Processing Board Game Data

Background

This dataset comes from the Board Game Geek database. The site's database has more than 90,000 games, with crowd-sourced ratings. This particular subset is limited to only games with at least 50 ratings which were published between 1950 and 2016. This still leaves us with 10,532 games! For more information please check out the tidytuesday repo which is where this example was taken from.

Data Cleaning

%matplotlib inline

One-Shot

This cell demonstrates the cleaning process using the call chaining approach championed in pyjanitor

cleaned_df = (
    pd.read_csv(
        "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-12//board_games.csv"  # noqa: E501
    )  # ingest raw data
    .clean_names()  # removes whitespace, punctuation/symbols, capitalization
    .remove_empty()  # removes entirely empty rows / columns
    .drop(
        columns=["image", "thumbnail", "compilation", "game_id"]
    )  # drops unnecessary columns
)

Multi-Step

These cells repeat the process in a step-by-step manner in order to explain it in more detail

Read in the csv

df = pd.read_csv(
    "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-03-12/board_games.csv"  # noqa: E501
)
df.head(3)

	game_id	description	image	max_players	max_playtime	min_age	min_players	min_playtime	name	playing_time	...	artist	category	compilation	designer	expansion	family	mechanic	publisher	average_rating	users_rated
0	1	Die Macher is a game about seven sequential po...	//cf.geekdo-images.com/images/pic159509.jpg	5	240	14	3	240	Die Macher	240	...	Marcus Gschwendtner	Economic,Negotiation,Political	NaN	Karl-Heinz Schmiel	NaN	Country: Germany,Valley Games Classic Line	Area Control / Area Influence,Auction/Bidding,...	Hans im Glück Verlags-GmbH,Moskito Spiele,Vall...	7.66508	4498
1	2	Dragonmaster is a trick-taking card game based...	//cf.geekdo-images.com/images/pic184174.jpg	4	30	12	3	30	Dragonmaster	30	...	Bob Pepper	Card Game,Fantasy	NaN	G. W. "Jerry" D'Arcey	NaN	Animals: Dragons	Trick-taking	E.S. Lowe,Milton Bradley	6.60815	478
2	3	Part of the Knizia tile-laying trilogy, Samura...	//cf.geekdo-images.com/images/pic3211873.jpg	4	60	10	2	30	Samurai	60	...	Franz Vohwinkel	Abstract Strategy,Medieval	NaN	Reiner Knizia	NaN	Asian Theme,Country: Japan,Knizia tile-laying ...	Area Control / Area Influence,Hand Management,...	999 Games,ABACUSSPIELE,Astrel Games,Ceilikan J...	7.44119	12019

3 rows × 22 columns

Remove the whitespace, punctuation/symbols, and capitalization form columns

df = df.clean_names()
df.head(3)

	game_id	description	image	max_players	max_playtime	min_age	min_players	min_playtime	name	playing_time	...	artist	category	compilation	designer	expansion	family	mechanic	publisher	average_rating	users_rated
0	1	Die Macher is a game about seven sequential po...	//cf.geekdo-images.com/images/pic159509.jpg	5	240	14	3	240	Die Macher	240	...	Marcus Gschwendtner	Economic,Negotiation,Political	NaN	Karl-Heinz Schmiel	NaN	Country: Germany,Valley Games Classic Line	Area Control / Area Influence,Auction/Bidding,...	Hans im Glück Verlags-GmbH,Moskito Spiele,Vall...	7.66508	4498
1	2	Dragonmaster is a trick-taking card game based...	//cf.geekdo-images.com/images/pic184174.jpg	4	30	12	3	30	Dragonmaster	30	...	Bob Pepper	Card Game,Fantasy	NaN	G. W. "Jerry" D'Arcey	NaN	Animals: Dragons	Trick-taking	E.S. Lowe,Milton Bradley	6.60815	478
2	3	Part of the Knizia tile-laying trilogy, Samura...	//cf.geekdo-images.com/images/pic3211873.jpg	4	60	10	2	30	Samurai	60	...	Franz Vohwinkel	Abstract Strategy,Medieval	NaN	Reiner Knizia	NaN	Asian Theme,Country: Japan,Knizia tile-laying ...	Area Control / Area Influence,Hand Management,...	999 Games,ABACUSSPIELE,Astrel Games,Ceilikan J...	7.44119	12019

3 rows × 22 columns

Remove all the empty rows and columns if present

df = df.remove_empty()
df.head(3)

	game_id	description	image	max_players	max_playtime	min_age	min_players	min_playtime	name	playing_time	...	artist	category	compilation	designer	expansion	family	mechanic	publisher	average_rating	users_rated
0	1	Die Macher is a game about seven sequential po...	//cf.geekdo-images.com/images/pic159509.jpg	5	240	14	3	240	Die Macher	240	...	Marcus Gschwendtner	Economic,Negotiation,Political	NaN	Karl-Heinz Schmiel	NaN	Country: Germany,Valley Games Classic Line	Area Control / Area Influence,Auction/Bidding,...	Hans im Glück Verlags-GmbH,Moskito Spiele,Vall...	7.66508	4498
1	2	Dragonmaster is a trick-taking card game based...	//cf.geekdo-images.com/images/pic184174.jpg	4	30	12	3	30	Dragonmaster	30	...	Bob Pepper	Card Game,Fantasy	NaN	G. W. "Jerry" D'Arcey	NaN	Animals: Dragons	Trick-taking	E.S. Lowe,Milton Bradley	6.60815	478
2	3	Part of the Knizia tile-laying trilogy, Samura...	//cf.geekdo-images.com/images/pic3211873.jpg	4	60	10	2	30	Samurai	60	...	Franz Vohwinkel	Abstract Strategy,Medieval	NaN	Reiner Knizia	NaN	Asian Theme,Country: Japan,Knizia tile-laying ...	Area Control / Area Influence,Hand Management,...	999 Games,ABACUSSPIELE,Astrel Games,Ceilikan J...	7.44119	12019

3 rows × 22 columns

Check to see whether "min_playtime" and "max_playtime" columns are equal

len(df[df["min_playtime"] != df["max_playtime"]])

Check to see what percentage of the values in the "compilation" column are not null

len(df[df["compilation"].notnull()]) / len(df)

0.03892897835169009

Drop unnecessary columns

The 'compilation' column was demonstrated to have little value, the "image" and "thumbnail" columns link to images and are not a factor in this analysis. The "game_id" column can be replaced by using the index.

df = df.drop(columns=["image", "thumbnail", "compilation", "game_id"])
df.head(3)

	description	max_players	max_playtime	min_age	min_players	min_playtime	name	playing_time	year_published	artist	category	designer	expansion	family	mechanic	publisher	average_rating	users_rated
0	Die Macher is a game about seven sequential po...	5	240	14	3	240	Die Macher	240	1986	Marcus Gschwendtner	Economic,Negotiation,Political	Karl-Heinz Schmiel	NaN	Country: Germany,Valley Games Classic Line	Area Control / Area Influence,Auction/Bidding,...	Hans im Glück Verlags-GmbH,Moskito Spiele,Vall...	7.66508	4498
1	Dragonmaster is a trick-taking card game based...	4	30	12	3	30	Dragonmaster	30	1981	Bob Pepper	Card Game,Fantasy	G. W. "Jerry" D'Arcey	NaN	Animals: Dragons	Trick-taking	E.S. Lowe,Milton Bradley	6.60815	478
2	Part of the Knizia tile-laying trilogy, Samura...	4	60	10	2	30	Samurai	60	1998	Franz Vohwinkel	Abstract Strategy,Medieval	Reiner Knizia	NaN	Asian Theme,Country: Japan,Knizia tile-laying ...	Area Control / Area Influence,Hand Management,...	999 Games,ABACUSSPIELE,Astrel Games,Ceilikan J...	7.44119	12019

Sample Analysis

What Categories appear most often?

df["category"].value_counts().head(10)

Wargame,World War II          449
Card Game                     438
Abstract Strategy             284
Napoleonic,Wargame            124
Economic                      116
Card Game,Fantasy             110
Dice                          107
American Civil War,Wargame     97
Modern Warfare,Wargame         89
Party Game                     77
Name: category, dtype: int64

What is the relationship between games' player numbers, reccomended minimum age, and the game's estimated length?

sns.pairplot(
    df,
    x_vars=["min_age", "min_players", "min_playtime"],
    y_vars="users_rated",
    height=7,
    aspect=0.7,
);

Preliminary analysis

Without digging into the data too much more it becomes apparent that there are some entries that were improperly entered e.g. having a minimum playtime of 60000 minutes. Otherwise we see some nice bell curves.