Utils
Miscellaneous internal PyJanitor helper functions.
check(varname, value, expected_types)
One-liner syntactic sugar for checking types. It can also check callables.
Example usage:
check('x', x, [int, float])
Parameters:
Name | Type | Description | Default |
---|---|---|---|
varname |
str |
The name of the variable (for diagnostic error message). |
required |
value |
The value of the |
required | |
expected_types |
list |
The type(s) the item is expected to be. |
required |
Exceptions:
Type | Description |
---|---|
TypeError |
if data is not the expected type. |
Source code in janitor/utils.py
def check(varname: str, value, expected_types: list):
"""
One-liner syntactic sugar for checking types.
It can also check callables.
Example usage:
```python
check('x', x, [int, float])
```
:param varname: The name of the variable (for diagnostic error message).
:param value: The value of the `varname`.
:param expected_types: The type(s) the item is expected to be.
:raises TypeError: if data is not the expected type.
"""
is_expected_type: bool = False
for t in expected_types:
if t is callable:
is_expected_type = t(value)
else:
is_expected_type = isinstance(value, t)
if is_expected_type:
break
if not is_expected_type:
raise TypeError(f"{varname} should be one of {expected_types}.")
check_column(df, column_names, present=True)
One-liner syntactic sugar for checking the presence or absence of columns.
Example usage:
check(df, ['a', 'b'], present=True)
This will check whether columns 'a'
and 'b'
are present in
df
's columns.
One can also guarantee that 'a'
and 'b'
are not present
by switching to present=False
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df |
DataFrame |
The name of the variable. |
required |
column_names |
Union[Iterable, str] |
A list of column names we want to check to see if present (or absent) in |
required |
present |
bool |
If |
True |
Exceptions:
Type | Description |
---|---|
ValueError |
if data is not the expected type. |
Source code in janitor/utils.py
def check_column(
df: pd.DataFrame, column_names: Union[Iterable, str], present: bool = True
):
"""
One-liner syntactic sugar for checking the presence or absence
of columns.
Example usage:
```python
check(df, ['a', 'b'], present=True)
```
This will check whether columns `'a'` and `'b'` are present in
`df`'s columns.
One can also guarantee that `'a'` and `'b'` are not present
by switching to `present=False`.
:param df: The name of the variable.
:param column_names: A list of column names we want to check to see if
present (or absent) in `df`.
:param present: If `True` (default), checks to see if all of `column_names`
are in `df.columns`. If `False`, checks that none of `column_names` are
in `df.columns`.
:raises ValueError: if data is not the expected type.
"""
if isinstance(column_names, str) or not isinstance(column_names, Iterable):
column_names = [column_names]
for column_name in column_names:
if present and column_name not in df.columns: # skipcq: PYL-R1720
raise ValueError(
f"{column_name} not present in dataframe columns!"
)
elif not present and column_name in df.columns:
raise ValueError(
f"{column_name} already present in dataframe columns!"
)
deprecated_alias(**aliases)
Used as a decorator when deprecating old function argument names, while
keeping backwards compatibility. Implementation is inspired from StackOverflow
.
Functional usage example:
@deprecated_alias(a='alpha', b='beta')
def simple_sum(alpha, beta):
return alpha + beta
Parameters:
Name | Type | Description | Default |
---|---|---|---|
aliases |
Dictionary of aliases for a function's arguments. |
{} |
Returns:
Type | Description |
---|---|
Callable |
Your original function wrapped with the |
Source code in janitor/utils.py
def deprecated_alias(**aliases) -> Callable:
"""
Used as a decorator when deprecating old function argument names, while
keeping backwards compatibility. Implementation is inspired from [`StackOverflow`][stack_link].
[stack_link]: https://stackoverflow.com/questions/49802412/how-to-implement-deprecation-in-python-with-argument-alias
Functional usage example:
```python
@deprecated_alias(a='alpha', b='beta')
def simple_sum(alpha, beta):
return alpha + beta
```
:param aliases: Dictionary of aliases for a function's arguments.
:return: Your original function wrapped with the `kwarg` redirection
function.
""" # noqa: E501
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
rename_kwargs(func.__name__, kwargs, aliases)
return func(*args, **kwargs)
return wrapper
return decorator
deprecated_kwargs(*arguments, *, message="The keyword argument '{argument}' of '{func_name}' is deprecated.", error=True)
Used as a decorator when deprecating function's keyword arguments.
Example:
from janitor.utils import deprecated_kwargs
@deprecated_kwargs('x', 'y')
def plus(a, b, x=0, y=0):
return a + b
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arguments |
list |
The list of deprecated keyword arguments. |
() |
message |
str |
The message of |
"The keyword argument '{argument}' of '{func_name}' is deprecated." |
error |
bool |
If True raises |
True |
Returns:
Type | Description |
---|---|
Callable |
The original function wrapped with the deprecated |
Exceptions:
Type | Description |
---|---|
ValueError |
If one of |
Source code in janitor/utils.py
def deprecated_kwargs(
*arguments: list[str],
message: str = (
"The keyword argument '{argument}' of '{func_name}' is deprecated."
),
error: bool = True,
) -> Callable:
"""
Used as a decorator when deprecating function's keyword arguments.
Example:
```python
from janitor.utils import deprecated_kwargs
@deprecated_kwargs('x', 'y')
def plus(a, b, x=0, y=0):
return a + b
```
:param arguments: The list of deprecated keyword arguments.
:param message: The message of `ValueError` or `DeprecationWarning`.
It should be a string or a string template. If a string template
defaults input `func_name` and `argument`.
:param error: If True raises `ValueError` else returns
`DeprecationWarning`.
:return: The original function wrapped with the deprecated `kwargs`
checking function.
:raises ValueError: If one of `arguments` is in the decorated function's
keyword arguments. # noqa: DAR402
"""
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for argument in arguments:
if argument in kwargs:
msg = message.format(
func_name=func.__name__,
argument=argument,
)
if error:
raise ValueError(msg)
else:
warn(msg, DeprecationWarning)
return func(*args, **kwargs)
return wrapper
return decorator
idempotent(func, df, *args, **kwargs)
Raises an error if a function operating on a DataFrame is not idempotent.
That is, func(func(df)) = func(df)
is not True
for all df
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func |
Callable |
A Python method. |
required |
df |
DataFrame |
A pandas |
required |
args |
Positional arguments supplied to the method. |
() |
|
kwargs |
Keyword arguments supplied to the method. |
{} |
Exceptions:
Type | Description |
---|---|
ValueError |
If |
Source code in janitor/utils.py
def idempotent(func: Callable, df: pd.DataFrame, *args, **kwargs):
"""
Raises an error if a function operating on a DataFrame is not idempotent.
That is, `func(func(df)) = func(df)` is not `True` for all `df`.
:param func: A Python method.
:param df: A pandas `DataFrame`.
:param args: Positional arguments supplied to the method.
:param kwargs: Keyword arguments supplied to the method.
:raises ValueError: If `func` is found to not be idempotent for the given
DataFrame (`df`).
"""
if not func(df, *args, **kwargs) == func(
func(df, *args, **kwargs), *args, **kwargs
):
raise ValueError(
"Supplied function is not idempotent for the given DataFrame."
)
import_message(submodule, package, conda_channel=None, pip_install=False)
Return warning if package is not found.
Generic message for indicating to the user when a function relies on an
optional module / package that is not currently installed. Includes
installation instructions. Used in chemistry.py
and biology.py
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
submodule |
str |
|
required |
package |
str |
External package this submodule relies on. |
required |
conda_channel |
str |
|
None |
pip_install |
bool |
Whether package can be installed via |
False |
Source code in janitor/utils.py
def import_message(
submodule: str,
package: str,
conda_channel: str = None,
pip_install: bool = False,
):
"""
Return warning if package is not found.
Generic message for indicating to the user when a function relies on an
optional module / package that is not currently installed. Includes
installation instructions. Used in `chemistry.py` and `biology.py`.
:param submodule: `pyjanitor` submodule that needs an external dependency.
:param package: External package this submodule relies on.
:param conda_channel: `conda` channel package can be installed from,
if at all.
:param pip_install: Whether package can be installed via `pip`.
"""
is_conda = os.path.exists(os.path.join(sys.prefix, "conda-meta"))
installable = True
if is_conda:
if conda_channel is None:
installable = False
installation = f"{package} cannot be installed via conda"
else:
installation = f"conda install -c {conda_channel} {package}"
else:
if pip_install:
installation = f"pip install {package}"
else:
installable = False
installation = f"{package} cannot be installed via pip"
print(
f"To use the janitor submodule {submodule}, you need to install "
f"{package}."
)
print()
if installable:
print("To do so, use the following command:")
print()
print(f" {installation}")
else:
print(f"{installation}")
is_connected(url)
This is a helper function to check if the client is connected to the internet.
Example: print(is_connected("www.google.com")) console >> True
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url |
str |
We take a test url to check if we are able to create a valid connection. |
required |
Returns:
Type | Description |
---|---|
bool |
We return a boolean that signifies our connection to the internet |
Exceptions:
Type | Description |
---|---|
OSError |
if connection to |
Source code in janitor/utils.py
def is_connected(url: str) -> bool:
"""
This is a helper function to check if the client
is connected to the internet.
Example:
print(is_connected("www.google.com"))
console >> True
:param url: We take a test url to check if we are
able to create a valid connection.
:raises OSError: if connection to `URL` cannot be
established
:return: We return a boolean that signifies our
connection to the internet
"""
try:
sock = socket.create_connection((url, 80))
if sock is not None:
sock.close()
return True
except OSError as e:
warn(
"There was an issue connecting to the internet. "
"Please see original error below."
)
raise e
return False
refactored_function(message)
Used as a decorator when refactoring functions.
Implementation is inspired from Hacker Noon
.
Functional usage example:
@refactored_function(
message="simple_sum() has been refactored. Use hard_sum() instead."
)
def simple_sum(alpha, beta):
return alpha + beta
Parameters:
Name | Type | Description | Default |
---|---|---|---|
message |
str |
Message to use in warning user about refactoring. |
required |
Returns:
Type | Description |
---|---|
Callable |
Your original function wrapped with the kwarg redirection function. |
Source code in janitor/utils.py
def refactored_function(message: str) -> Callable:
"""
Used as a decorator when refactoring functions.
Implementation is inspired from [`Hacker Noon`][hacker_link].
[hacker_link]: https://hackernoon.com/why-refactoring-how-to-restructure-python-package-51b89aa91987
Functional usage example:
```python
@refactored_function(
message="simple_sum() has been refactored. Use hard_sum() instead."
)
def simple_sum(alpha, beta):
return alpha + beta
```
:param message: Message to use in warning user about refactoring.
:return: Your original function wrapped with the kwarg redirection
function.
""" # noqa: E501
def decorator(func):
def emit_warning(*args, **kwargs):
warn(message, FutureWarning)
return func(*args, **kwargs)
return emit_warning
return decorator
rename_kwargs(func_name, kwargs, aliases)
Used to update deprecated argument names with new names. Throws a
TypeError
if both arguments are provided, and warns if old alias
is used. Nothing is returned as the passed kwargs
are modified
directly. Implementation is inspired from StackOverflow
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
func_name |
str |
name of decorated function. |
required |
kwargs |
Dict |
Arguments supplied to the method. |
required |
aliases |
Dict |
Dictionary of aliases for a function's arguments. |
required |
Exceptions:
Type | Description |
---|---|
TypeError |
if both arguments are provided. |
Source code in janitor/utils.py
def rename_kwargs(func_name: str, kwargs: Dict, aliases: Dict):
"""
Used to update deprecated argument names with new names. Throws a
`TypeError` if both arguments are provided, and warns if old alias
is used. Nothing is returned as the passed `kwargs` are modified
directly. Implementation is inspired from [`StackOverflow`][stack_link].
[stack_link]: https://stackoverflow.com/questions/49802412/how-to-implement-deprecation-in-python-with-argument-alias
:param func_name: name of decorated function.
:param kwargs: Arguments supplied to the method.
:param aliases: Dictionary of aliases for a function's arguments.
:raises TypeError: if both arguments are provided.
""" # noqa: E501
for old_alias, new_alias in aliases.items():
if old_alias in kwargs:
if new_alias in kwargs:
raise TypeError(
f"{func_name} received both {old_alias} and {new_alias}"
)
warn(
f"{old_alias} is deprecated; use {new_alias}",
DeprecationWarning,
)
kwargs[new_alias] = kwargs.pop(old_alias)
skiperror(f, return_x=False, return_val=nan)
Decorator for escaping any error in a function.
Example usage:
df[column].apply(
skiperror(transform, return_val=3, return_x=False))
# Can also be used as shown below
@skiperror(return_val=3, return_x=False)
def transform(x):
pass
Parameters:
Name | Type | Description | Default |
---|---|---|---|
f |
Callable |
the function to be wrapped. |
required |
return_x |
bool |
whether or not the original value that caused error should be returned. |
False |
return_val |
the value to be returned when an error hits. Ignored if |
nan |
Returns:
Type | Description |
---|---|
Callable |
the wrapped function. |
Source code in janitor/utils.py
def skiperror(
f: Callable, return_x: bool = False, return_val=np.nan
) -> Callable:
"""
Decorator for escaping any error in a function.
Example usage:
```python
df[column].apply(
skiperror(transform, return_val=3, return_x=False))
# Can also be used as shown below
@skiperror(return_val=3, return_x=False)
def transform(x):
pass
```
:param f: the function to be wrapped.
:param return_x: whether or not the original value that caused error
should be returned.
:param return_val: the value to be returned when an error hits.
Ignored if `return_x` is `True`.
:returns: the wrapped function.
"""
def _wrapped(x, *args, **kwargs):
try:
return f(x, *args, **kwargs)
except Exception: # skipcq: PYL-W0703
if return_x:
return x
return return_val
return _wrapped
skipna(f)
Decorator for escaping np.nan
and None
in a function.
Example usage:
df[column].apply(skipna(transform))
# Can also be used as shown below
@skipna
def transform(x):
pass
Parameters:
Name | Type | Description | Default |
---|---|---|---|
f |
Callable |
the function to be wrapped. |
required |
Returns:
Type | Description |
---|---|
Callable |
the wrapped function. |
Source code in janitor/utils.py
def skipna(f: Callable) -> Callable:
"""
Decorator for escaping `np.nan` and `None` in a function.
Example usage:
```python
df[column].apply(skipna(transform))
# Can also be used as shown below
@skipna
def transform(x):
pass
```
:param f: the function to be wrapped.
:returns: the wrapped function.
"""
def _wrapped(x, *args, **kwargs):
if (type(x) is float and np.isnan(x)) or x is None:
return np.nan
return f(x, *args, **kwargs)
return _wrapped