Skip to content

Engineering

Engineering-specific data cleaning functions.

convert_units(df, column_name=None, existing_units=None, to_units=None, dest_column_name=None)

Converts a column of numeric values from one unit to another.

Unit conversion can only take place if the existing_units and to_units are of the same type (e.g., temperature or pressure). The provided unit types can be any unit name or alternate name provided in the unyt package's Listing of Units table.

Volume units are not provided natively in unyt. However, exponents are supported, and therefore some volume units can be converted. For example, a volume in cubic centimeters can be converted to cubic meters using existing_units='cm**3' and to_units='m**3'.

This method mutates the original DataFrame.

Examples:

>>> import pandas as pd
>>> import janitor.engineering
>>> df = pd.DataFrame({"temp_F": [-40, 112]})
>>> df = df.convert_units(
...     column_name='temp_F',
...     existing_units='degF',
...     to_units='degC',
...     dest_column_name='temp_C'
... )
>>> df
   temp_F     temp_C
0     -40 -40.000000
1     112  44.444444

Parameters:

Name Type Description Default
df DataFrame

A pandas DataFrame.

required
column_name str

Name of the column containing numeric values that are to be converted from one set of units to another.

None
existing_units str

The unit type to convert from.

None
to_units str

The unit type to convert to.

None
dest_column_name str

The name of the new column containing the converted values that will be created.

None

Raises:

Type Description
TypeError

If column is not numeric.

Returns:

Type Description
DataFrame

A pandas DataFrame with a new column of unit-converted values.

Source code in janitor/engineering.py
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
@pf.register_dataframe_method
def convert_units(
    df: pd.DataFrame,
    column_name: str = None,
    existing_units: str = None,
    to_units: str = None,
    dest_column_name: str = None,
) -> pd.DataFrame:
    """Converts a column of numeric values from one unit to another.

    Unit conversion can only take place if the `existing_units` and
    `to_units` are of the same type (e.g., temperature or pressure).
    The provided unit types can be any unit name or alternate name provided
    in the `unyt` package's [Listing of Units table](
    https://unyt.readthedocs.io/en/stable/unit_listing.html#unit-listing).

    Volume units are not provided natively in `unyt`.  However, exponents are
    supported, and therefore some volume units can be converted.  For example,
    a volume in cubic centimeters can be converted to cubic meters using
    `existing_units='cm**3'` and `to_units='m**3'`.

    This method mutates the original DataFrame.

    Examples:
        >>> import pandas as pd
        >>> import janitor.engineering
        >>> df = pd.DataFrame({"temp_F": [-40, 112]})
        >>> df = df.convert_units(
        ...     column_name='temp_F',
        ...     existing_units='degF',
        ...     to_units='degC',
        ...     dest_column_name='temp_C'
        ... )
        >>> df
           temp_F     temp_C
        0     -40 -40.000000
        1     112  44.444444

    Args:
        df: A pandas DataFrame.
        column_name: Name of the column containing numeric
            values that are to be converted from one set of units to another.
        existing_units: The unit type to convert from.
        to_units: The unit type to convert to.
        dest_column_name: The name of the new column containing the
            converted values that will be created.

    Raises:
        TypeError: If column is not numeric.

    Returns:
        A pandas DataFrame with a new column of unit-converted values.
    """

    # Check all inputs are correct data type
    check("column_name", column_name, [str])
    check("existing_units", existing_units, [str])
    check("to_units", to_units, [str])
    check("dest_column_name", dest_column_name, [str])

    # Check that column_name is a numeric column
    if not np.issubdtype(df[column_name].dtype, np.number):
        raise TypeError(f"{column_name} must be a numeric column.")

    original_vals = df[column_name].to_numpy() * unyt.Unit(existing_units)
    converted_vals = original_vals.to(to_units)
    df[dest_column_name] = np.array(converted_vals)

    return df