Skip to content

Utils

Miscellaneous internal PyJanitor helper functions.

check(varname, value, expected_types)

One-liner syntactic sugar for checking types. It can also check callables.

Example usage:

check('x', x, [int, float])

Parameters:

Name Type Description Default
varname str

The name of the variable (for diagnostic error message).

required
value

The value of the varname.

required
expected_types list

The type(s) the item is expected to be.

required

Exceptions:

Type Description
TypeError

if data is not the expected type.

Source code in janitor/utils.py
def check(varname: str, value, expected_types: list):
    """
    One-liner syntactic sugar for checking types.
    It can also check callables.

    Example usage:

    ```python
    check('x', x, [int, float])
    ```

    :param varname: The name of the variable (for diagnostic error message).
    :param value: The value of the `varname`.
    :param expected_types: The type(s) the item is expected to be.
    :raises TypeError: if data is not the expected type.
    """
    is_expected_type: bool = False
    for t in expected_types:
        if t is callable:
            is_expected_type = t(value)
        else:
            is_expected_type = isinstance(value, t)
        if is_expected_type:
            break

    if not is_expected_type:
        raise TypeError(f"{varname} should be one of {expected_types}.")

check_column(df, column_names, present=True)

One-liner syntactic sugar for checking the presence or absence of columns.

Example usage:

check(df, ['a', 'b'], present=True)

This will check whether columns 'a' and 'b' are present in df's columns.

One can also guarantee that 'a' and 'b' are not present by switching to present=False.

Parameters:

Name Type Description Default
df DataFrame

The name of the variable.

required
column_names Union[Iterable, str]

A list of column names we want to check to see if present (or absent) in df.

required
present bool

If True (default), checks to see if all of column_names are in df.columns. If False, checks that none of column_names are in df.columns.

True

Exceptions:

Type Description
ValueError

if data is not the expected type.

Source code in janitor/utils.py
def check_column(
    df: pd.DataFrame, column_names: Union[Iterable, str], present: bool = True
):
    """
    One-liner syntactic sugar for checking the presence or absence
    of columns.

    Example usage:

    ```python
    check(df, ['a', 'b'], present=True)
    ```

    This will check whether columns `'a'` and `'b'` are present in
    `df`'s columns.

    One can also guarantee that `'a'` and `'b'` are not present
    by switching to `present=False`.

    :param df: The name of the variable.
    :param column_names: A list of column names we want to check to see if
        present (or absent) in `df`.
    :param present: If `True` (default), checks to see if all of `column_names`
        are in `df.columns`. If `False`, checks that none of `column_names` are
        in `df.columns`.
    :raises ValueError: if data is not the expected type.
    """
    if isinstance(column_names, str) or not isinstance(column_names, Iterable):
        column_names = [column_names]

    for column_name in column_names:
        if present and column_name not in df.columns:  # skipcq: PYL-R1720
            raise ValueError(
                f"{column_name} not present in dataframe columns!"
            )
        elif not present and column_name in df.columns:
            raise ValueError(
                f"{column_name} already present in dataframe columns!"
            )

deprecated_alias(**aliases)

Used as a decorator when deprecating old function argument names, while keeping backwards compatibility. Implementation is inspired from StackOverflow.

Functional usage example:

@deprecated_alias(a='alpha', b='beta')
def simple_sum(alpha, beta):
    return alpha + beta

Parameters:

Name Type Description Default
aliases

Dictionary of aliases for a function's arguments.

{}

Returns:

Type Description
Callable

Your original function wrapped with the kwarg redirection function.

Source code in janitor/utils.py
def deprecated_alias(**aliases) -> Callable:
    """
    Used as a decorator when deprecating old function argument names, while
    keeping backwards compatibility. Implementation is inspired from [`StackOverflow`][stack_link].

    [stack_link]: https://stackoverflow.com/questions/49802412/how-to-implement-deprecation-in-python-with-argument-alias

    Functional usage example:

    ```python
    @deprecated_alias(a='alpha', b='beta')
    def simple_sum(alpha, beta):
        return alpha + beta
    ```

    :param aliases: Dictionary of aliases for a function's arguments.
    :return: Your original function wrapped with the `kwarg` redirection
        function.
    """  # noqa: E501

    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            rename_kwargs(func.__name__, kwargs, aliases)
            return func(*args, **kwargs)

        return wrapper

    return decorator

deprecated_kwargs(*arguments, *, message="The keyword argument '{argument}' of '{func_name}' is deprecated.", error=True)

Used as a decorator when deprecating function's keyword arguments.

Example:

from janitor.utils import deprecated_kwargs

@deprecated_kwargs('x', 'y')
def plus(a, b, x=0, y=0):
    return a + b

Parameters:

Name Type Description Default
arguments list

The list of deprecated keyword arguments.

()
message str

The message of ValueError or DeprecationWarning. It should be a string or a string template. If a string template defaults input func_name and argument.

"The keyword argument '{argument}' of '{func_name}' is deprecated."
error bool

If True raises ValueError else returns DeprecationWarning.

True

Returns:

Type Description
Callable

The original function wrapped with the deprecated kwargs checking function.

Exceptions:

Type Description
ValueError

If one of arguments is in the decorated function's keyword arguments. # noqa: DAR402

Source code in janitor/utils.py
def deprecated_kwargs(
    *arguments: list[str],
    message: str = (
        "The keyword argument '{argument}' of '{func_name}' is deprecated."
    ),
    error: bool = True,
) -> Callable:
    """
    Used as a decorator when deprecating function's keyword arguments.

    Example:

    ```python
    from janitor.utils import deprecated_kwargs

    @deprecated_kwargs('x', 'y')
    def plus(a, b, x=0, y=0):
        return a + b
    ```

    :param arguments: The list of deprecated keyword arguments.
    :param message: The message of `ValueError` or `DeprecationWarning`.
        It should be a string or a string template. If a string template
        defaults input `func_name` and `argument`.
    :param error: If True raises `ValueError` else returns
        `DeprecationWarning`.
    :return: The original function wrapped with the deprecated `kwargs`
        checking function.
    :raises ValueError: If one of `arguments` is in the decorated function's
        keyword arguments.  # noqa: DAR402
    """

    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for argument in arguments:
                if argument in kwargs:
                    msg = message.format(
                        func_name=func.__name__,
                        argument=argument,
                    )
                    if error:
                        raise ValueError(msg)
                    else:
                        warn(msg, DeprecationWarning)

            return func(*args, **kwargs)

        return wrapper

    return decorator

idempotent(func, df, *args, **kwargs)

Raises an error if a function operating on a DataFrame is not idempotent. That is, func(func(df)) = func(df) is not True for all df.

Parameters:

Name Type Description Default
func Callable

A Python method.

required
df DataFrame

A pandas DataFrame.

required
args

Positional arguments supplied to the method.

()
kwargs

Keyword arguments supplied to the method.

{}

Exceptions:

Type Description
ValueError

If func is found to not be idempotent for the given DataFrame (df).

Source code in janitor/utils.py
def idempotent(func: Callable, df: pd.DataFrame, *args, **kwargs):
    """
    Raises an error if a function operating on a DataFrame is not idempotent.
    That is, `func(func(df)) = func(df)` is not `True` for all `df`.

    :param func: A Python method.
    :param df: A pandas `DataFrame`.
    :param args: Positional arguments supplied to the method.
    :param kwargs: Keyword arguments supplied to the method.
    :raises ValueError: If `func` is found to not be idempotent for the given
        DataFrame (`df`).
    """
    if not func(df, *args, **kwargs) == func(
        func(df, *args, **kwargs), *args, **kwargs
    ):
        raise ValueError(
            "Supplied function is not idempotent for the given DataFrame."
        )

import_message(submodule, package, conda_channel=None, pip_install=False)

Return warning if package is not found.

Generic message for indicating to the user when a function relies on an optional module / package that is not currently installed. Includes installation instructions. Used in chemistry.py and biology.py.

Parameters:

Name Type Description Default
submodule str

pyjanitor submodule that needs an external dependency.

required
package str

External package this submodule relies on.

required
conda_channel str

conda channel package can be installed from, if at all.

None
pip_install bool

Whether package can be installed via pip.

False
Source code in janitor/utils.py
def import_message(
    submodule: str,
    package: str,
    conda_channel: str = None,
    pip_install: bool = False,
):
    """
    Return warning if package is not found.

    Generic message for indicating to the user when a function relies on an
    optional module / package that is not currently installed. Includes
    installation instructions. Used in `chemistry.py` and `biology.py`.

    :param submodule: `pyjanitor` submodule that needs an external dependency.
    :param package: External package this submodule relies on.
    :param conda_channel: `conda` channel package can be installed from,
        if at all.
    :param pip_install: Whether package can be installed via `pip`.
    """
    is_conda = os.path.exists(os.path.join(sys.prefix, "conda-meta"))
    installable = True
    if is_conda:
        if conda_channel is None:
            installable = False
            installation = f"{package} cannot be installed via conda"
        else:
            installation = f"conda install -c {conda_channel} {package}"
    else:
        if pip_install:
            installation = f"pip install {package}"
        else:
            installable = False
            installation = f"{package} cannot be installed via pip"

    print(
        f"To use the janitor submodule {submodule}, you need to install "
        f"{package}."
    )
    print()
    if installable:
        print("To do so, use the following command:")
        print()
        print(f"    {installation}")
    else:
        print(f"{installation}")

is_connected(url)

This is a helper function to check if the client is connected to the internet.

Example: print(is_connected("www.google.com")) console >> True

Parameters:

Name Type Description Default
url str

We take a test url to check if we are able to create a valid connection.

required

Returns:

Type Description
bool

We return a boolean that signifies our connection to the internet

Exceptions:

Type Description
OSError

if connection to URL cannot be established

Source code in janitor/utils.py
def is_connected(url: str) -> bool:
    """
    This is a helper function to check if the client
    is connected to the internet.

    Example:
        print(is_connected("www.google.com"))
        console >> True

    :param url: We take a test url to check if we are
        able to create a valid connection.
    :raises OSError: if connection to `URL` cannot be
        established
    :return: We return a boolean that signifies our
        connection to the internet
    """
    try:
        sock = socket.create_connection((url, 80))
        if sock is not None:
            sock.close()
            return True
    except OSError as e:

        warn(
            "There was an issue connecting to the internet. "
            "Please see original error below."
        )
        raise e
    return False

refactored_function(message)

Used as a decorator when refactoring functions.

Implementation is inspired from Hacker Noon.

Functional usage example:

@refactored_function(
    message="simple_sum() has been refactored. Use hard_sum() instead."
)
def simple_sum(alpha, beta):
    return alpha + beta

Parameters:

Name Type Description Default
message str

Message to use in warning user about refactoring.

required

Returns:

Type Description
Callable

Your original function wrapped with the kwarg redirection function.

Source code in janitor/utils.py
def refactored_function(message: str) -> Callable:
    """
    Used as a decorator when refactoring functions.

    Implementation is inspired from [`Hacker Noon`][hacker_link].

    [hacker_link]: https://hackernoon.com/why-refactoring-how-to-restructure-python-package-51b89aa91987

    Functional usage example:

    ```python
    @refactored_function(
        message="simple_sum() has been refactored. Use hard_sum() instead."
    )
    def simple_sum(alpha, beta):
        return alpha + beta
    ```

    :param message: Message to use in warning user about refactoring.
    :return: Your original function wrapped with the kwarg redirection
        function.
    """  # noqa: E501

    def decorator(func):
        def emit_warning(*args, **kwargs):
            warn(message, FutureWarning)
            return func(*args, **kwargs)

        return emit_warning

    return decorator

rename_kwargs(func_name, kwargs, aliases)

Used to update deprecated argument names with new names. Throws a TypeError if both arguments are provided, and warns if old alias is used. Nothing is returned as the passed kwargs are modified directly. Implementation is inspired from StackOverflow.

Parameters:

Name Type Description Default
func_name str

name of decorated function.

required
kwargs Dict

Arguments supplied to the method.

required
aliases Dict

Dictionary of aliases for a function's arguments.

required

Exceptions:

Type Description
TypeError

if both arguments are provided.

Source code in janitor/utils.py
def rename_kwargs(func_name: str, kwargs: Dict, aliases: Dict):
    """
    Used to update deprecated argument names with new names. Throws a
    `TypeError` if both arguments are provided, and warns if old alias
    is used. Nothing is returned as the passed `kwargs` are modified
    directly. Implementation is inspired from [`StackOverflow`][stack_link].

    [stack_link]: https://stackoverflow.com/questions/49802412/how-to-implement-deprecation-in-python-with-argument-alias

    :param func_name: name of decorated function.
    :param kwargs: Arguments supplied to the method.
    :param aliases: Dictionary of aliases for a function's arguments.
    :raises TypeError: if both arguments are provided.
    """  # noqa: E501
    for old_alias, new_alias in aliases.items():
        if old_alias in kwargs:
            if new_alias in kwargs:
                raise TypeError(
                    f"{func_name} received both {old_alias} and {new_alias}"
                )
            warn(
                f"{old_alias} is deprecated; use {new_alias}",
                DeprecationWarning,
            )
            kwargs[new_alias] = kwargs.pop(old_alias)

skiperror(f, return_x=False, return_val=nan)

Decorator for escaping any error in a function.

Example usage:

df[column].apply(
    skiperror(transform, return_val=3, return_x=False))

# Can also be used as shown below
@skiperror(return_val=3, return_x=False)
def transform(x):
    pass

Parameters:

Name Type Description Default
f Callable

the function to be wrapped.

required
return_x bool

whether or not the original value that caused error should be returned.

False
return_val

the value to be returned when an error hits. Ignored if return_x is True.

nan

Returns:

Type Description
Callable

the wrapped function.

Source code in janitor/utils.py
def skiperror(
    f: Callable, return_x: bool = False, return_val=np.nan
) -> Callable:
    """
    Decorator for escaping any error in a function.

    Example usage:

    ```python
    df[column].apply(
        skiperror(transform, return_val=3, return_x=False))

    # Can also be used as shown below
    @skiperror(return_val=3, return_x=False)
    def transform(x):
        pass
    ```
    :param f: the function to be wrapped.
    :param return_x: whether or not the original value that caused error
        should be returned.
    :param return_val: the value to be returned when an error hits.
        Ignored if `return_x` is `True`.
    :returns: the wrapped function.
    """

    def _wrapped(x, *args, **kwargs):
        try:
            return f(x, *args, **kwargs)
        except Exception:  # skipcq: PYL-W0703
            if return_x:
                return x
            return return_val

    return _wrapped

skipna(f)

Decorator for escaping np.nan and None in a function.

Example usage:

df[column].apply(skipna(transform))

# Can also be used as shown below
@skipna
def transform(x):
    pass

Parameters:

Name Type Description Default
f Callable

the function to be wrapped.

required

Returns:

Type Description
Callable

the wrapped function.

Source code in janitor/utils.py
def skipna(f: Callable) -> Callable:
    """
    Decorator for escaping `np.nan` and `None` in a function.

    Example usage:

    ```python
    df[column].apply(skipna(transform))

    # Can also be used as shown below
    @skipna
    def transform(x):
        pass
    ```

    :param f: the function to be wrapped.
    :returns: the wrapped function.
    """

    def _wrapped(x, *args, **kwargs):
        if (type(x) is float and np.isnan(x)) or x is None:
            return np.nan
        return f(x, *args, **kwargs)

    return _wrapped