Engineering
Engineering-specific data cleaning functions.
convert_units(df, column_name=None, existing_units=None, to_units=None, dest_column_name=None)
Converts a column of numeric values from one unit to another.
Functional usage example:
Method chaining usage example:
```python
import pandas as pd import janitor.engineering df = pd.DataFrame({"temp_F": [-40, 112]}) df = df.convert_units( ... column_name='temp_F', ... existing_units='degF', ... to_units='degC', ... dest_column_name='temp_C' ... ) df temp_F temp_C 0 -40 -40.000000 1 112 44.444444
Unit conversion can only take place if the existing_units
and
to_units
are of the same type (e.g., temperature or pressure).
The provided unit types can be any unit name or alternate name provided
in the unyt
package's Listing of Units table.
Volume units are not provided natively in unyt
. However, exponents are
supported, and therefore some volume units can be converted. For example,
a volume in cubic centimeters can be converted to cubic meters using
existing_units='cm**3'
and to_units='m**3'
.
Note: This method mutates the original DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame |
A pandas DataFrame. |
required |
column_name |
str |
Name of the column containing numeric values that are to be converted from one set of units to another. |
None |
existing_units |
str |
The unit type to convert from. |
None |
to_units |
str |
The unit type to convert to. |
None |
dest_column_name |
str |
The name of the new column containing the converted values that will be created. |
None |
Returns:
Type | Description |
---|---|
DataFrame |
A pandas DataFrame with a new column of unit-converted values. |
Exceptions:
Type | Description |
---|---|
TypeError |
if column is not numeric. |
Source code in janitor/engineering.py
@pf.register_dataframe_method
def convert_units(
df: pd.DataFrame,
column_name: str = None,
existing_units: str = None,
to_units: str = None,
dest_column_name: str = None,
) -> pd.DataFrame:
"""
Converts a column of numeric values from one unit to another.
Functional usage example:
Method chaining usage example:
```python
>>> import pandas as pd
>>> import janitor.engineering
>>> df = pd.DataFrame({"temp_F": [-40, 112]})
>>> df = df.convert_units(
... column_name='temp_F',
... existing_units='degF',
... to_units='degC',
... dest_column_name='temp_C'
... )
>>> df
temp_F temp_C
0 -40 -40.000000
1 112 44.444444
Unit conversion can only take place if the `existing_units` and
`to_units` are of the same type (e.g., temperature or pressure).
The provided unit types can be any unit name or alternate name provided
in the `unyt` package's [Listing of Units table](
https://unyt.readthedocs.io/en/stable/unit_listing.html#unit-listing).
Volume units are not provided natively in `unyt`. However, exponents are
supported, and therefore some volume units can be converted. For example,
a volume in cubic centimeters can be converted to cubic meters using
`existing_units='cm**3'` and `to_units='m**3'`.
**Note**: This method mutates the original DataFrame.
:param df: A pandas DataFrame.
:param column_name: Name of the column containing numeric
values that are to be converted from one set of units to another.
:param existing_units: The unit type to convert from.
:param to_units: The unit type to convert to.
:param dest_column_name: The name of the new column containing the
converted values that will be created.
:returns: A pandas DataFrame with a new column of unit-converted values.
:raises TypeError: if column is not numeric.
"""
# Check all inputs are correct data type
check("column_name", column_name, [str])
check("existing_units", existing_units, [str])
check("to_units", to_units, [str])
check("dest_column_name", dest_column_name, [str])
# Check that column_name is a numeric column
if not np.issubdtype(df[column_name].dtype, np.number):
raise TypeError(f"{column_name} must be a numeric column.")
original_vals = df[column_name].to_numpy() * unyt.Unit(existing_units)
converted_vals = original_vals.to(to_units)
df[dest_column_name] = np.array(converted_vals)
return df