# Math

Miscellaneous mathematical operators.

## ecdf(s)

Return cumulative distribution of values in a series.

Intended to be used with the following pattern:

df = pd.DataFrame(...)

# Obtain ECDF values to be plotted
x, y = df["column_name"].ecdf()

# Plot ECDF values
plt.scatter(x, y)


Null values must be dropped from the series, otherwise a ValueError is raised.

Also, if the dtype of the series is not numeric, a TypeError is raised.

>>> import pandas as pd
>>> import janitor
>>> df = pd.DataFrame({"numbers": [0, 4, 0, 1, 2, 1, 1, 3]})
>>> x, y = df["numbers"].ecdf()
>>> x
array([0, 0, 1, 1, 1, 2, 3, 4])
>>> y
array([0.125, 0.25 , 0.375, 0.5  , 0.625, 0.75 , 0.875, 1.   ])



Parameters:

Name Type Description Default
s Series

A pandas series. dtype should be numeric.

required

Returns:

Type Description
Tuple[numpy.ndarray, numpy.ndarray]

(x, y). x: sorted array of values. y: cumulative fraction of data points with value x or lower.

Exceptions:

Type Description
TypeError

if series is not numeric.

ValueError

if series contains nulls.

Source code in janitor/math.py
@pf.register_series_method
def ecdf(s: pd.Series) -> Tuple[np.ndarray, np.ndarray]:
"""
Return cumulative distribution of values in a series.

Intended to be used with the following pattern:

python
df = pd.DataFrame(...)

# Obtain ECDF values to be plotted
x, y = df["column_name"].ecdf()

# Plot ECDF values
plt.scatter(x, y)


Null values must be dropped from the series,
otherwise a ValueError is raised.

Also, if the dtype of the series is not numeric,
a TypeError is raised.

>>> import pandas as pd
>>> import janitor
>>> df = pd.DataFrame({"numbers": [0, 4, 0, 1, 2, 1, 1, 3]})
>>> x, y = df["numbers"].ecdf()
>>> x
array([0, 0, 1, 1, 1, 2, 3, 4])
>>> y
array([0.125, 0.25 , 0.375, 0.5  , 0.625, 0.75 , 0.875, 1.   ])

:param s: A pandas series. dtype should be numeric.
:returns: (x, y).
x: sorted array of values.
y: cumulative fraction of data points with value x or lower.
:raises TypeError: if series is not numeric.
:raises ValueError: if series contains nulls.
"""
if not is_numeric_dtype(s):
raise TypeError(f"series {s.name} must be numeric!")
if not s.isna().sum() == 0:
raise ValueError(f"series {s.name} contains nulls. Please drop them.")

n = len(s)
x = np.sort(s)
y = np.arange(1, n + 1) / n

return x, y


## exp(s)

Take the exponential transform of the series.

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.exp()
0     1.000000
1     2.718282
2    20.085537
Name: numbers, dtype: float64



Parameters:

Name Type Description Default
s Series

Input Series.

required

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
@pf.register_series_method
def exp(s: pd.Series) -> pd.Series:
"""
Take the exponential transform of the series.

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.exp()
0     1.000000
1     2.718282
2    20.085537
Name: numbers, dtype: float64

:param s: Input Series.
:return: Transformed Series.
"""
return np.exp(s)


## log(s, error='warn')

Take natural logarithm of the Series.

Each value in the series should be positive. Use error to control the behavior if there are nonpositive entries in the series.

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.log(error="ignore")
0         NaN
1    0.000000
2    1.098612
Name: numbers, dtype: float64



Parameters:

Name Type Description Default
s Series

Input Series.

required
error str

Determines behavior when taking the log of nonpositive entries. If 'warn' then a RuntimeWarning is thrown. If 'raise', then a RuntimeError is thrown. Otherwise, nothing is thrown and log of nonpositive values is np.nan; defaults to 'warn'.

'warn'

Returns:

Type Description
Series

Transformed Series.

Exceptions:

Type Description
RuntimeError

Raised when there are nonpositive values in the Series and error='raise'.

Source code in janitor/math.py
@pf.register_series_method
def log(s: pd.Series, error: str = "warn") -> pd.Series:
"""
Take natural logarithm of the Series.

Each value in the series should be positive. Use error to control the
behavior if there are nonpositive entries in the series.

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.log(error="ignore")
0         NaN
1    0.000000
2    1.098612
Name: numbers, dtype: float64

:param s: Input Series.
:param error: Determines behavior when taking the log of nonpositive
entries. If 'warn' then a RuntimeWarning is thrown. If 'raise',
then a RuntimeError is thrown. Otherwise, nothing is thrown and
log of nonpositive values is np.nan; defaults to 'warn'.
:raises RuntimeError: Raised when there are nonpositive values in the
Series and error='raise'.
:return: Transformed Series.
"""
s = s.copy()
nonpositive = s <= 0
if (nonpositive).any():
msg = f"Log taken on {nonpositive.sum()} nonpositive value(s)"
if error.lower() == "warn":
warnings.warn(msg, RuntimeWarning)
if error.lower() == "raise":
raise RuntimeError(msg)
else:
pass
s[nonpositive] = np.nan
return np.log(s)


## logit(s, error='warn')

Take logit transform of the Series where:

logit(p) = log(p/(1-p))


Each value in the series should be between 0 and 1. Use error to control the behavior if any series entries are outside of (0, 1).

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0.1, 0.5, 0.9], name="numbers")
>>> s.logit()
0   -2.197225
1    0.000000
2    2.197225
Name: numbers, dtype: float64



Parameters:

Name Type Description Default
s Series

Input Series.

required
error str

Determines behavior when s is outside of (0, 1). If 'warn' then a RuntimeWarning is thrown. If 'raise', then a RuntimeError is thrown. Otherwise, nothing is thrown and np.nan is returned for the problematic entries; defaults to 'warn'.

'warn'

Returns:

Type Description
Series

Transformed Series.

Exceptions:

Type Description
RuntimeError

if error is set to 'raise'.

Source code in janitor/math.py
@pf.register_series_method
def logit(s: pd.Series, error: str = "warn") -> pd.Series:
"""
Take logit transform of the Series where:

python
logit(p) = log(p/(1-p))


Each value in the series should be between 0 and 1. Use error to
control the behavior if any series entries are outside of (0, 1).

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0.1, 0.5, 0.9], name="numbers")
>>> s.logit()
0   -2.197225
1    0.000000
2    2.197225
Name: numbers, dtype: float64

:param s: Input Series.
:param error: Determines behavior when s is outside of (0, 1).
If 'warn' then a RuntimeWarning is thrown. If 'raise', then a
RuntimeError is thrown. Otherwise, nothing is thrown and np.nan
is returned for the problematic entries; defaults to 'warn'.
:return: Transformed Series.
:raises RuntimeError: if error is set to 'raise'.
"""
s = s.copy()
outside_support = (s <= 0) | (s >= 1)
if (outside_support).any():
msg = f"{outside_support.sum()} value(s) are outside of (0, 1)"
if error.lower() == "warn":
warnings.warn(msg, RuntimeWarning)
if error.lower() == "raise":
raise RuntimeError(msg)
else:
pass
s[outside_support] = np.nan
return scipy_logit(s)


## normal_cdf(s)

Transforms the Series via the CDF of the Normal distribution.

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([-1, 0, 3], name="numbers")
>>> s.normal_cdf()
0    0.158655
1    0.500000
2    0.998650
dtype: float64



Parameters:

Name Type Description Default
s Series

Input Series.

required

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
@pf.register_series_method
def normal_cdf(s: pd.Series) -> pd.Series:
"""
Transforms the Series via the CDF of the Normal distribution.

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([-1, 0, 3], name="numbers")
>>> s.normal_cdf()
0    0.158655
1    0.500000
2    0.998650
dtype: float64

:param s: Input Series.
:return: Transformed Series.
"""
return pd.Series(norm.cdf(s), index=s.index)


## probit(s, error='warn')

Transforms the Series via the inverse CDF of the Normal distribution.

Each value in the series should be between 0 and 1. Use error to control the behavior if any series entries are outside of (0, 1).

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0.1, 0.5, 0.8], name="numbers")
>>> s.probit()
0   -1.281552
1    0.000000
2    0.841621
dtype: float64



Parameters:

Name Type Description Default
s Series

Input Series.

required
error str

Determines behavior when s is outside of (0, 1). If 'warn' then a RuntimeWarning is thrown. If 'raise', then a RuntimeError is thrown. Otherwise, nothing is thrown and np.nan is returned for the problematic entries; defaults to 'warn'.

'warn'

Returns:

Type Description
Series

Transformed Series

Exceptions:

Type Description
RuntimeError

Raised when there are problematic values in the Series and error='raise'.

Source code in janitor/math.py
@pf.register_series_method
def probit(s: pd.Series, error: str = "warn") -> pd.Series:
"""
Transforms the Series via the inverse CDF of the Normal distribution.

Each value in the series should be between 0 and 1. Use error to
control the behavior if any series entries are outside of (0, 1).

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0.1, 0.5, 0.8], name="numbers")
>>> s.probit()
0   -1.281552
1    0.000000
2    0.841621
dtype: float64

:param s: Input Series.
:param error: Determines behavior when s is outside of (0, 1).
If 'warn' then a RuntimeWarning is thrown. If 'raise', then
a RuntimeError is thrown. Otherwise, nothing is thrown and np.nan
is returned for the problematic entries; defaults to 'warn'.
:raises RuntimeError: Raised when there are problematic values
in the Series and error='raise'.
:return: Transformed Series
"""
s = s.copy()
outside_support = (s <= 0) | (s >= 1)
if (outside_support).any():
msg = f"{outside_support.sum()} value(s) are outside of (0, 1)"
if error.lower() == "warn":
warnings.warn(msg, RuntimeWarning)
if error.lower() == "raise":
raise RuntimeError(msg)
else:
pass
s[outside_support] = np.nan
with np.errstate(all="ignore"):
out = pd.Series(norm.ppf(s), index=s.index)
return out


## sigmoid(s)

Take the sigmoid transform of the series where:

sigmoid(x) = 1 / (1 + exp(-x))

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([-1, 0, 4], name="numbers")
>>> s.sigmoid()
0    0.268941
1    0.500000
2    0.982014
Name: numbers, dtype: float64



Parameters:

Name Type Description Default
s Series

Input Series.

required

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
@pf.register_series_method
def sigmoid(s: pd.Series) -> pd.Series:
"""
Take the sigmoid transform of the series where:

python
sigmoid(x) = 1 / (1 + exp(-x))


>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([-1, 0, 4], name="numbers")
>>> s.sigmoid()
0    0.268941
1    0.500000
2    0.982014
Name: numbers, dtype: float64

:param s: Input Series.
:return: Transformed Series.
"""
return expit(s)


## softmax(s)

Take the softmax transform of the series.

The softmax function transforms each element of a collection by computing the exponential of each element divided by the sum of the exponentials of all the elements.

That is, if x is a one-dimensional numpy array or pandas Series:

softmax(x) = exp(x)/sum(exp(x))

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.softmax()
0    0.042010
1    0.114195
2    0.843795
Name: numbers, dtype: float64



Parameters:

Name Type Description Default
s Series

Input Series.

required

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
@pf.register_series_method
def softmax(s: pd.Series) -> pd.Series:
"""
Take the softmax transform of the series.

The softmax function transforms each element of a collection by
computing the exponential of each element divided by the sum of the
exponentials of all the elements.

That is, if x is a one-dimensional numpy array or pandas Series:

python
softmax(x) = exp(x)/sum(exp(x))


>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.softmax()
0    0.042010
1    0.114195
2    0.843795
Name: numbers, dtype: float64

:param s: Input Series.
:return: Transformed Series.
"""
return scipy_softmax(s)


## z_score(s, moments_dict=None, keys=('mean', 'std'))

Transforms the Series into z-scores where:

z = (s - s.mean()) / s.std()

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.z_score()
0   -0.872872
1   -0.218218
2    1.091089
Name: numbers, dtype: float64



Parameters:

Name Type Description Default
s Series

Input Series.

required
moments_dict dict

If not None, then the mean and standard deviation used to compute the z-score transformation is saved as entries in moments_dict with keys determined by the keys argument; defaults to None.

None
keys Tuple[str, str]

Determines the keys saved in moments_dict if moments are saved; defaults to ('mean', 'std').

('mean', 'std')

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
@pf.register_series_method
def z_score(
s: pd.Series,
moments_dict: dict = None,
keys: Tuple[str, str] = ("mean", "std"),
) -> pd.Series:
"""
Transforms the Series into z-scores where:

python
z = (s - s.mean()) / s.std()


>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.z_score()
0   -0.872872
1   -0.218218
2    1.091089
Name: numbers, dtype: float64

:param s: Input Series.
:param moments_dict: If not None, then the mean and standard
deviation used to compute the z-score transformation is
saved as entries in moments_dict with keys determined by
the keys argument; defaults to None.
:param keys: Determines the keys saved in moments_dict
if moments are saved; defaults to ('mean', 'std').
:return: Transformed Series.
"""
mean = s.mean()
std = s.std()
if std == 0:
return 0
if moments_dict is not None:
moments_dict[keys] = mean
moments_dict[keys] = std
return (s - mean) / std