Math
Miscellaneous mathematical operators.
ecdf(s)
Return cumulative distribution of values in a series.
Intended to be used with the following pattern:
df = pd.DataFrame(...)
# Obtain ECDF values to be plotted
x, y = df["column_name"].ecdf()
# Plot ECDF values
plt.scatter(x, y)
Null values must be dropped from the series,
otherwise a ValueError
is raised.
Also, if the dtype
of the series is not numeric,
a TypeError
is raised.
>>> import pandas as pd
>>> import janitor
>>> df = pd.DataFrame({"numbers": [0, 4, 0, 1, 2, 1, 1, 3]})
>>> x, y = df["numbers"].ecdf()
>>> x
array([0, 0, 1, 1, 1, 2, 3, 4])
>>> y
array([0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875, 1. ])
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
Series |
A pandas series. |
required |
Returns:
Type | Description |
---|---|
Tuple[numpy.ndarray, numpy.ndarray] |
|
Exceptions:
Type | Description |
---|---|
TypeError |
if series is not numeric. |
ValueError |
if series contains nulls. |
Source code in janitor/math.py
@pf.register_series_method
def ecdf(s: pd.Series) -> Tuple[np.ndarray, np.ndarray]:
"""
Return cumulative distribution of values in a series.
Intended to be used with the following pattern:
```python
df = pd.DataFrame(...)
# Obtain ECDF values to be plotted
x, y = df["column_name"].ecdf()
# Plot ECDF values
plt.scatter(x, y)
```
Null values must be dropped from the series,
otherwise a `ValueError` is raised.
Also, if the `dtype` of the series is not numeric,
a `TypeError` is raised.
>>> import pandas as pd
>>> import janitor
>>> df = pd.DataFrame({"numbers": [0, 4, 0, 1, 2, 1, 1, 3]})
>>> x, y = df["numbers"].ecdf()
>>> x
array([0, 0, 1, 1, 1, 2, 3, 4])
>>> y
array([0.125, 0.25 , 0.375, 0.5 , 0.625, 0.75 , 0.875, 1. ])
:param s: A pandas series. `dtype` should be numeric.
:returns: `(x, y)`.
`x`: sorted array of values.
`y`: cumulative fraction of data points with value `x` or lower.
:raises TypeError: if series is not numeric.
:raises ValueError: if series contains nulls.
"""
if not is_numeric_dtype(s):
raise TypeError(f"series {s.name} must be numeric!")
if not s.isna().sum() == 0:
raise ValueError(f"series {s.name} contains nulls. Please drop them.")
n = len(s)
x = np.sort(s)
y = np.arange(1, n + 1) / n
return x, y
exp(s)
Take the exponential transform of the series.
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.exp()
0 1.000000
1 2.718282
2 20.085537
Name: numbers, dtype: float64
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
Series |
Input Series. |
required |
Returns:
Type | Description |
---|---|
Series |
Transformed Series. |
Source code in janitor/math.py
@pf.register_series_method
def exp(s: pd.Series) -> pd.Series:
"""
Take the exponential transform of the series.
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.exp()
0 1.000000
1 2.718282
2 20.085537
Name: numbers, dtype: float64
:param s: Input Series.
:return: Transformed Series.
"""
return np.exp(s)
log(s, error='warn')
Take natural logarithm of the Series.
Each value in the series should be positive. Use error
to control the
behavior if there are nonpositive entries in the series.
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.log(error="ignore")
0 NaN
1 0.000000
2 1.098612
Name: numbers, dtype: float64
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
Series |
Input Series. |
required |
error |
str |
Determines behavior when taking the log of nonpositive entries. If |
'warn' |
Returns:
Type | Description |
---|---|
Series |
Transformed Series. |
Exceptions:
Type | Description |
---|---|
RuntimeError |
Raised when there are nonpositive values in the Series and |
Source code in janitor/math.py
@pf.register_series_method
def log(s: pd.Series, error: str = "warn") -> pd.Series:
"""
Take natural logarithm of the Series.
Each value in the series should be positive. Use `error` to control the
behavior if there are nonpositive entries in the series.
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.log(error="ignore")
0 NaN
1 0.000000
2 1.098612
Name: numbers, dtype: float64
:param s: Input Series.
:param error: Determines behavior when taking the log of nonpositive
entries. If `'warn'` then a `RuntimeWarning` is thrown. If `'raise'`,
then a `RuntimeError` is thrown. Otherwise, nothing is thrown and
log of nonpositive values is `np.nan`; defaults to `'warn'`.
:raises RuntimeError: Raised when there are nonpositive values in the
Series and `error='raise'`.
:return: Transformed Series.
"""
s = s.copy()
nonpositive = s <= 0
if (nonpositive).any():
msg = f"Log taken on {nonpositive.sum()} nonpositive value(s)"
if error.lower() == "warn":
warnings.warn(msg, RuntimeWarning)
if error.lower() == "raise":
raise RuntimeError(msg)
else:
pass
s[nonpositive] = np.nan
return np.log(s)
logit(s, error='warn')
Take logit transform of the Series where:
logit(p) = log(p/(1-p))
Each value in the series should be between 0 and 1. Use error
to
control the behavior if any series entries are outside of (0, 1).
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0.1, 0.5, 0.9], name="numbers")
>>> s.logit()
0 -2.197225
1 0.000000
2 2.197225
Name: numbers, dtype: float64
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
Series |
Input Series. |
required |
error |
str |
Determines behavior when |
'warn' |
Returns:
Type | Description |
---|---|
Series |
Transformed Series. |
Exceptions:
Type | Description |
---|---|
RuntimeError |
if |
Source code in janitor/math.py
@pf.register_series_method
def logit(s: pd.Series, error: str = "warn") -> pd.Series:
"""
Take logit transform of the Series where:
```python
logit(p) = log(p/(1-p))
```
Each value in the series should be between 0 and 1. Use `error` to
control the behavior if any series entries are outside of (0, 1).
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0.1, 0.5, 0.9], name="numbers")
>>> s.logit()
0 -2.197225
1 0.000000
2 2.197225
Name: numbers, dtype: float64
:param s: Input Series.
:param error: Determines behavior when `s` is outside of `(0, 1)`.
If `'warn'` then a `RuntimeWarning` is thrown. If `'raise'`, then a
`RuntimeError` is thrown. Otherwise, nothing is thrown and `np.nan`
is returned for the problematic entries; defaults to `'warn'`.
:return: Transformed Series.
:raises RuntimeError: if `error` is set to `'raise'`.
"""
s = s.copy()
outside_support = (s <= 0) | (s >= 1)
if (outside_support).any():
msg = f"{outside_support.sum()} value(s) are outside of (0, 1)"
if error.lower() == "warn":
warnings.warn(msg, RuntimeWarning)
if error.lower() == "raise":
raise RuntimeError(msg)
else:
pass
s[outside_support] = np.nan
return scipy_logit(s)
normal_cdf(s)
Transforms the Series via the CDF of the Normal distribution.
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([-1, 0, 3], name="numbers")
>>> s.normal_cdf()
0 0.158655
1 0.500000
2 0.998650
dtype: float64
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
Series |
Input Series. |
required |
Returns:
Type | Description |
---|---|
Series |
Transformed Series. |
Source code in janitor/math.py
@pf.register_series_method
def normal_cdf(s: pd.Series) -> pd.Series:
"""
Transforms the Series via the CDF of the Normal distribution.
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([-1, 0, 3], name="numbers")
>>> s.normal_cdf()
0 0.158655
1 0.500000
2 0.998650
dtype: float64
:param s: Input Series.
:return: Transformed Series.
"""
return pd.Series(norm.cdf(s), index=s.index)
probit(s, error='warn')
Transforms the Series via the inverse CDF of the Normal distribution.
Each value in the series should be between 0 and 1. Use error
to
control the behavior if any series entries are outside of (0, 1).
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0.1, 0.5, 0.8], name="numbers")
>>> s.probit()
0 -1.281552
1 0.000000
2 0.841621
dtype: float64
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
Series |
Input Series. |
required |
error |
str |
Determines behavior when |
'warn' |
Returns:
Type | Description |
---|---|
Series |
Transformed Series |
Exceptions:
Type | Description |
---|---|
RuntimeError |
Raised when there are problematic values in the Series and |
Source code in janitor/math.py
@pf.register_series_method
def probit(s: pd.Series, error: str = "warn") -> pd.Series:
"""
Transforms the Series via the inverse CDF of the Normal distribution.
Each value in the series should be between 0 and 1. Use `error` to
control the behavior if any series entries are outside of (0, 1).
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0.1, 0.5, 0.8], name="numbers")
>>> s.probit()
0 -1.281552
1 0.000000
2 0.841621
dtype: float64
:param s: Input Series.
:param error: Determines behavior when `s` is outside of `(0, 1)`.
If `'warn'` then a `RuntimeWarning` is thrown. If `'raise'`, then
a `RuntimeError` is thrown. Otherwise, nothing is thrown and `np.nan`
is returned for the problematic entries; defaults to `'warn'`.
:raises RuntimeError: Raised when there are problematic values
in the Series and `error='raise'`.
:return: Transformed Series
"""
s = s.copy()
outside_support = (s <= 0) | (s >= 1)
if (outside_support).any():
msg = f"{outside_support.sum()} value(s) are outside of (0, 1)"
if error.lower() == "warn":
warnings.warn(msg, RuntimeWarning)
if error.lower() == "raise":
raise RuntimeError(msg)
else:
pass
s[outside_support] = np.nan
with np.errstate(all="ignore"):
out = pd.Series(norm.ppf(s), index=s.index)
return out
sigmoid(s)
Take the sigmoid transform of the series where:
sigmoid(x) = 1 / (1 + exp(-x))
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([-1, 0, 4], name="numbers")
>>> s.sigmoid()
0 0.268941
1 0.500000
2 0.982014
Name: numbers, dtype: float64
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
Series |
Input Series. |
required |
Returns:
Type | Description |
---|---|
Series |
Transformed Series. |
Source code in janitor/math.py
@pf.register_series_method
def sigmoid(s: pd.Series) -> pd.Series:
"""
Take the sigmoid transform of the series where:
```python
sigmoid(x) = 1 / (1 + exp(-x))
```
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([-1, 0, 4], name="numbers")
>>> s.sigmoid()
0 0.268941
1 0.500000
2 0.982014
Name: numbers, dtype: float64
:param s: Input Series.
:return: Transformed Series.
"""
return expit(s)
softmax(s)
Take the softmax transform of the series.
The softmax function transforms each element of a collection by computing the exponential of each element divided by the sum of the exponentials of all the elements.
That is, if x is a one-dimensional numpy array or pandas Series:
softmax(x) = exp(x)/sum(exp(x))
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.softmax()
0 0.042010
1 0.114195
2 0.843795
Name: numbers, dtype: float64
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
Series |
Input Series. |
required |
Returns:
Type | Description |
---|---|
Series |
Transformed Series. |
Source code in janitor/math.py
@pf.register_series_method
def softmax(s: pd.Series) -> pd.Series:
"""
Take the softmax transform of the series.
The softmax function transforms each element of a collection by
computing the exponential of each element divided by the sum of the
exponentials of all the elements.
That is, if x is a one-dimensional numpy array or pandas Series:
```python
softmax(x) = exp(x)/sum(exp(x))
```
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.softmax()
0 0.042010
1 0.114195
2 0.843795
Name: numbers, dtype: float64
:param s: Input Series.
:return: Transformed Series.
"""
return scipy_softmax(s)
z_score(s, moments_dict=None, keys=('mean', 'std'))
Transforms the Series into z-scores where:
z = (s - s.mean()) / s.std()
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.z_score()
0 -0.872872
1 -0.218218
2 1.091089
Name: numbers, dtype: float64
Parameters:
Name | Type | Description | Default |
---|---|---|---|
s |
Series |
Input Series. |
required |
moments_dict |
dict |
If not |
None |
keys |
Tuple[str, str] |
Determines the keys saved in |
('mean', 'std') |
Returns:
Type | Description |
---|---|
Series |
Transformed Series. |
Source code in janitor/math.py
@pf.register_series_method
def z_score(
s: pd.Series,
moments_dict: dict = None,
keys: Tuple[str, str] = ("mean", "std"),
) -> pd.Series:
"""
Transforms the Series into z-scores where:
```python
z = (s - s.mean()) / s.std()
```
>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.z_score()
0 -0.872872
1 -0.218218
2 1.091089
Name: numbers, dtype: float64
:param s: Input Series.
:param moments_dict: If not `None`, then the mean and standard
deviation used to compute the z-score transformation is
saved as entries in `moments_dict` with keys determined by
the `keys` argument; defaults to `None`.
:param keys: Determines the keys saved in `moments_dict`
if moments are saved; defaults to (`'mean'`, `'std'`).
:return: Transformed Series.
"""
mean = s.mean()
std = s.std()
if std == 0:
return 0
if moments_dict is not None:
moments_dict[keys[0]] = mean
moments_dict[keys[1]] = std
return (s - mean) / std