Skip to content

Binder

Using sort_naturally

import janitor
import pandas as pd
import pandas_flavor as pf

Let's say we have a pandas DataFrame that contains wells that we need to sort alphanumerically.

data = {
    "Well": ["A21", "A3", "A21", "B2", "B51", "B12"],
    "Value": [1, 2, 13, 3, 4, 7],
}
df = pd.DataFrame(data)
df
Well Value
0 A21 1
1 A3 2
2 A21 13
3 B2 3
4 B51 4
5 B12 7

A human would sort it in the order:

A3, A21, A21, B2, B12, B51

However, default sorting in pandas doesn't allow that:

df.sort_values("Well")
Well Value
0 A21 1
2 A21 13
1 A3 2
5 B12 7
3 B2 3
4 B51 4

Lexiographic sorting doesn't get us to where we want. A12 shouldn't come before A3, and B11 shouldn't come before B2. How might we fix this?

df.sort_naturally("Well")
Well Value
1 A3 2
0 A21 1
2 A21 13
3 B2 3
5 B12 7
4 B51 4

Now we're in sorting bliss! :)