Skip to content

Binder

row_to_names : Elevates a row to be the column names of a DataFrame.

Background

This notebook serves to show a brief and simple example of how to swap column names using one of the rows in the dataframe.

from io import StringIO

import janitor
import pandas as pd
data = """shoe, 220, 100
          shoe, 450, 40
          item, retail_price, cost
          shoe, 200, 38
          bag, 305, 25
       """
temp = pd.read_csv(StringIO(data), header=None)
temp
0 1 2
0 shoe 220 100
1 shoe 450 40
2 item retail_price cost
3 shoe 200 38
4 bag 305 25

Looking at the dataframe above, we would love to use row 2 as our column names. One way to achieve this involves a couple of steps

  1. Use loc/iloc to assign row 2 to columns.
  2. Strip off any whitespace.
  3. Drop row 2 from the dataframe using the drop method.
  4. Set axis name to none.
temp.columns = temp.iloc[2, :]
temp.columns = temp.columns.str.strip()
temp = temp.drop(2, axis=0)
temp = temp.rename_axis(None, axis="columns")
temp
item retail_price cost
0 shoe 220 100
1 shoe 450 40
3 shoe 200 38
4 bag 305 25

However, the first two steps prevent us from method chaining. This is easily resolved using the row_to_names function

df = pd.read_csv(StringIO(data), header=None).row_to_names(
    row_number=2, remove_row=True
)

df
/tmp/pyjanitor-examples_env/lib/python3.9/site-packages/janitor/functions/row_to_names.py:173: UserWarning: The function row_to_names will, in the official 1.0 release, change its behaviour to reset the dataframe's index by default. You can prepare for this change right now by explicitly setting `reset_index=True` when calling on `row_to_names`.
  warnings.warn(

item retail_price cost
0 shoe 220 100
1 shoe 450 40
3 shoe 200 38
4 bag 305 25