Skip to content

Math

Miscellaneous mathematical operators.

ecdf(s)

Return cumulative distribution of values in a series.

Null values must be dropped from the series, otherwise a ValueError is raised.

Also, if the dtype of the series is not numeric, a TypeError is raised.

Examples:

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 4, 0, 1, 2, 1, 1, 3])
>>> x, y = s.ecdf()
>>> x
array([0, 0, 1, 1, 1, 2, 3, 4])
>>> y
array([0.125, 0.25 , 0.375, 0.5  , 0.625, 0.75 , 0.875, 1.   ])

You can then plot the ECDF values, for example:

>>> from matplotlib import pyplot as plt
>>> plt.scatter(x, y)

Parameters:

Name Type Description Default
s Series

A pandas series. dtype should be numeric.

required

Raises:

Type Description
TypeError

If series is not numeric.

ValueError

If series contains nulls.

Returns:

Name Type Description
x ndarray

Sorted array of values.

y ndarray

Cumulative fraction of data points with value x or lower.

Source code in janitor/math.py
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
@pf.register_series_method
def ecdf(s: "Series") -> Tuple["ndarray", "ndarray"]:
    """Return cumulative distribution of values in a series.

    Null values must be dropped from the series,
    otherwise a `ValueError` is raised.

    Also, if the `dtype` of the series is not numeric,
    a `TypeError` is raised.

    Examples:
        >>> import pandas as pd
        >>> import janitor
        >>> s = pd.Series([0, 4, 0, 1, 2, 1, 1, 3])
        >>> x, y = s.ecdf()
        >>> x  # doctest: +SKIP
        array([0, 0, 1, 1, 1, 2, 3, 4])
        >>> y  # doctest: +SKIP
        array([0.125, 0.25 , 0.375, 0.5  , 0.625, 0.75 , 0.875, 1.   ])

        You can then plot the ECDF values, for example:

        >>> from matplotlib import pyplot as plt
        >>> plt.scatter(x, y)  # doctest: +SKIP

    Args:
        s: A pandas series. `dtype` should be numeric.

    Raises:
        TypeError: If series is not numeric.
        ValueError: If series contains nulls.

    Returns:
        x: Sorted array of values.
        y: Cumulative fraction of data points with value `x` or lower.
    """
    import numpy as np
    import pandas.api.types as pdtypes

    if not pdtypes.is_numeric_dtype(s):
        raise TypeError(f"series {s.name} must be numeric!")
    if not s.isna().sum() == 0:
        raise ValueError(f"series {s.name} contains nulls. Please drop them.")

    n = len(s)
    x = np.sort(s)
    y = np.arange(1, n + 1) / n

    return x, y

exp(s)

Take the exponential transform of the series.

Examples:

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.exp()
0     1.000000
1     2.718282
2    20.085537
Name: numbers, dtype: float64

Parameters:

Name Type Description Default
s Series

Input Series.

required

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
@pf.register_series_method
def exp(s: "Series") -> "Series":
    """Take the exponential transform of the series.

    Examples:
        >>> import pandas as pd
        >>> import janitor
        >>> s = pd.Series([0, 1, 3], name="numbers")
        >>> s.exp()
        0     1.000000
        1     2.718282
        2    20.085537
        Name: numbers, dtype: float64

    Args:
        s: Input Series.

    Returns:
        Transformed Series.
    """
    import numpy as np

    return np.exp(s)

log(s, error='warn')

Take natural logarithm of the Series.

Each value in the series should be positive. Use error to control the behavior if there are nonpositive entries in the series.

Examples:

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.log(error="ignore")
0         NaN
1    0.000000
2    1.098612
Name: numbers, dtype: float64

Parameters:

Name Type Description Default
s Series

Input Series.

required
error str

Determines behavior when taking the log of nonpositive entries. If 'warn' then a RuntimeWarning is thrown. If 'raise', then a RuntimeError is thrown. Otherwise, nothing is thrown and log of nonpositive values is np.nan.

'warn'

Raises:

Type Description
RuntimeError

Raised when there are nonpositive values in the Series and error='raise'.

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
@pf.register_series_method
def log(s: "Series", error: str = "warn") -> "Series":
    """
    Take natural logarithm of the Series.

    Each value in the series should be positive. Use `error` to control the
    behavior if there are nonpositive entries in the series.

    Examples:
        >>> import pandas as pd
        >>> import janitor
        >>> s = pd.Series([0, 1, 3], name="numbers")
        >>> s.log(error="ignore")
        0         NaN
        1    0.000000
        2    1.098612
        Name: numbers, dtype: float64

    Args:
        s: Input Series.
        error: Determines behavior when taking the log of nonpositive
            entries. If `'warn'` then a `RuntimeWarning` is thrown. If
            `'raise'`, then a `RuntimeError` is thrown. Otherwise, nothing
            is thrown and log of nonpositive values is `np.nan`.

    Raises:
        RuntimeError: Raised when there are nonpositive values in the
            Series and `error='raise'`.

    Returns:
        Transformed Series.
    """
    import numpy as np

    s = s.copy()
    nonpositive = s <= 0
    if (nonpositive).any():
        msg = f"Log taken on {nonpositive.sum()} nonpositive value(s)"
        if error.lower() == "warn":
            warnings.warn(msg, RuntimeWarning)
        if error.lower() == "raise":
            raise RuntimeError(msg)
        else:
            pass
    s[nonpositive] = np.nan
    return np.log(s)

logit(s, error='warn')

Take logit transform of the Series.

The logit transform is defined:

logit(p) = log(p/(1-p))

Each value in the series should be between 0 and 1. Use error to control the behavior if any series entries are outside of (0, 1).

Examples:

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0.1, 0.5, 0.9], name="numbers")
>>> s.logit()
0   -2.197225
1    0.000000
2    2.197225
Name: numbers, dtype: float64

Parameters:

Name Type Description Default
s Series

Input Series.

required
error str

Determines behavior when s is outside of (0, 1). If 'warn' then a RuntimeWarning is thrown. If 'raise', then a RuntimeError is thrown. Otherwise, nothing is thrown and np.nan is returned for the problematic entries; defaults to 'warn'.

'warn'

Raises:

Type Description
RuntimeError

If error is set to 'raise'.

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
@pf.register_series_method
def logit(s: "Series", error: str = "warn") -> "Series":
    """Take logit transform of the Series.

    The logit transform is defined:

    ```python
    logit(p) = log(p/(1-p))
    ```

    Each value in the series should be between 0 and 1. Use `error` to
    control the behavior if any series entries are outside of (0, 1).

    Examples:
        >>> import pandas as pd
        >>> import janitor
        >>> s = pd.Series([0.1, 0.5, 0.9], name="numbers")
        >>> s.logit()
        0   -2.197225
        1    0.000000
        2    2.197225
        Name: numbers, dtype: float64

    Args:
        s: Input Series.
        error: Determines behavior when `s` is outside of `(0, 1)`.
            If `'warn'` then a `RuntimeWarning` is thrown. If `'raise'`, then a
            `RuntimeError` is thrown. Otherwise, nothing is thrown and `np.nan`
            is returned for the problematic entries; defaults to `'warn'`.

    Raises:
        RuntimeError: If `error` is set to `'raise'`.

    Returns:
        Transformed Series.
    """
    import numpy as np
    import scipy

    s = s.copy()
    outside_support = (s <= 0) | (s >= 1)
    if (outside_support).any():
        msg = f"{outside_support.sum()} value(s) are outside of (0, 1)"
        if error.lower() == "warn":
            warnings.warn(msg, RuntimeWarning)
        if error.lower() == "raise":
            raise RuntimeError(msg)
        else:
            pass
    s[outside_support] = np.nan
    return scipy.special.logit(s)

normal_cdf(s)

Transforms the Series via the CDF of the Normal distribution.

Examples:

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([-1, 0, 3], name="numbers")
>>> s.normal_cdf()
0    0.158655
1    0.500000
2    0.998650
dtype: float64

Parameters:

Name Type Description Default
s Series

Input Series.

required

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
@pf.register_series_method
def normal_cdf(s: "Series") -> "Series":
    """Transforms the Series via the CDF of the Normal distribution.

    Examples:
        >>> import pandas as pd
        >>> import janitor
        >>> s = pd.Series([-1, 0, 3], name="numbers")
        >>> s.normal_cdf()
        0    0.158655
        1    0.500000
        2    0.998650
        dtype: float64

    Args:
        s: Input Series.

    Returns:
        Transformed Series.
    """
    import pandas as pd
    import scipy

    return pd.Series(scipy.stats.norm.cdf(s), index=s.index)

probit(s, error='warn')

Transforms the Series via the inverse CDF of the Normal distribution.

Each value in the series should be between 0 and 1. Use error to control the behavior if any series entries are outside of (0, 1).

Examples:

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0.1, 0.5, 0.8], name="numbers")
>>> s.probit()
0   -1.281552
1    0.000000
2    0.841621
dtype: float64

Parameters:

Name Type Description Default
s Series

Input Series.

required
error str

Determines behavior when s is outside of (0, 1). If 'warn' then a RuntimeWarning is thrown. If 'raise', then a RuntimeError is thrown. Otherwise, nothing is thrown and np.nan is returned for the problematic entries.

'warn'

Raises:

Type Description
RuntimeError

When there are problematic values in the Series and error='raise'.

Returns:

Type Description
Series

Transformed Series

Source code in janitor/math.py
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
@pf.register_series_method
def probit(s: "Series", error: str = "warn") -> "Series":
    """Transforms the Series via the inverse CDF of the Normal distribution.

    Each value in the series should be between 0 and 1. Use `error` to
    control the behavior if any series entries are outside of (0, 1).

    Examples:
        >>> import pandas as pd
        >>> import janitor
        >>> s = pd.Series([0.1, 0.5, 0.8], name="numbers")
        >>> s.probit()
        0   -1.281552
        1    0.000000
        2    0.841621
        dtype: float64

    Args:
        s: Input Series.
        error: Determines behavior when `s` is outside of `(0, 1)`.
            If `'warn'` then a `RuntimeWarning` is thrown. If `'raise'`, then
            a `RuntimeError` is thrown. Otherwise, nothing is thrown and
            `np.nan` is returned for the problematic entries.

    Raises:
        RuntimeError: When there are problematic values
            in the Series and `error='raise'`.

    Returns:
        Transformed Series
    """
    import numpy as np
    import pandas as pd
    import scipy

    s = s.copy()
    outside_support = (s <= 0) | (s >= 1)
    if (outside_support).any():
        msg = f"{outside_support.sum()} value(s) are outside of (0, 1)"
        if error.lower() == "warn":
            warnings.warn(msg, RuntimeWarning)
        if error.lower() == "raise":
            raise RuntimeError(msg)
        else:
            pass
    s[outside_support] = np.nan
    with np.errstate(all="ignore"):
        out = pd.Series(scipy.stats.norm.ppf(s), index=s.index)
    return out

sigmoid(s)

Take the sigmoid transform of the series.

The sigmoid function is defined:

sigmoid(x) = 1 / (1 + exp(-x))

Examples:

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([-1, 0, 4], name="numbers")
>>> s.sigmoid()
0    0.268941
1    0.500000
2    0.982014
Name: numbers, dtype: float64

Parameters:

Name Type Description Default
s Series

Input Series.

required

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
@pf.register_series_method
def sigmoid(s: "Series") -> "Series":
    """Take the sigmoid transform of the series.

    The sigmoid function is defined:

    ```python
    sigmoid(x) = 1 / (1 + exp(-x))
    ```

    Examples:
        >>> import pandas as pd
        >>> import janitor
        >>> s = pd.Series([-1, 0, 4], name="numbers")
        >>> s.sigmoid()
        0    0.268941
        1    0.500000
        2    0.982014
        Name: numbers, dtype: float64

    Args:
        s: Input Series.

    Returns:
        Transformed Series.
    """
    import scipy

    return scipy.special.expit(s)

softmax(s)

Take the softmax transform of the series.

The softmax function transforms each element of a collection by computing the exponential of each element divided by the sum of the exponentials of all the elements.

That is, if x is a one-dimensional numpy array or pandas Series:

softmax(x) = exp(x)/sum(exp(x))

Examples:

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.softmax()
0    0.042010
1    0.114195
2    0.843795
Name: numbers, dtype: float64

Parameters:

Name Type Description Default
s Series

Input Series.

required

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
@pf.register_series_method
def softmax(s: "Series") -> "Series":
    """Take the softmax transform of the series.

    The softmax function transforms each element of a collection by
    computing the exponential of each element divided by the sum of the
    exponentials of all the elements.

    That is, if x is a one-dimensional numpy array or pandas Series:

    ```python
    softmax(x) = exp(x)/sum(exp(x))
    ```

    Examples:
        >>> import pandas as pd
        >>> import janitor
        >>> s = pd.Series([0, 1, 3], name="numbers")
        >>> s.softmax()
        0    0.042010
        1    0.114195
        2    0.843795
        Name: numbers, dtype: float64

    Args:
        s: Input Series.

    Returns:
        Transformed Series.
    """
    import pandas as pd
    import scipy

    return pd.Series(scipy.special.softmax(s), index=s.index, name=s.name)

z_score(s, moments_dict=None, keys=('mean', 'std'))

Transforms the Series into z-scores.

The z-score is defined:

z = (s - s.mean()) / s.std()

Examples:

>>> import pandas as pd
>>> import janitor
>>> s = pd.Series([0, 1, 3], name="numbers")
>>> s.z_score()
0   -0.872872
1   -0.218218
2    1.091089
Name: numbers, dtype: float64

Parameters:

Name Type Description Default
s Series

Input Series.

required
moments_dict dict

If not None, then the mean and standard deviation used to compute the z-score transformation is saved as entries in moments_dict with keys determined by the keys argument; defaults to None.

None
keys Tuple[str, str]

Determines the keys saved in moments_dict if moments are saved; defaults to ('mean', 'std').

('mean', 'std')

Returns:

Type Description
Series

Transformed Series.

Source code in janitor/math.py
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
@pf.register_series_method
def z_score(
    s: "Series",
    moments_dict: dict = None,
    keys: Tuple[str, str] = ("mean", "std"),
) -> "Series":
    """Transforms the Series into z-scores.

    The z-score is defined:

    ```python
    z = (s - s.mean()) / s.std()
    ```

    Examples:
        >>> import pandas as pd
        >>> import janitor
        >>> s = pd.Series([0, 1, 3], name="numbers")
        >>> s.z_score()
        0   -0.872872
        1   -0.218218
        2    1.091089
        Name: numbers, dtype: float64

    Args:
        s: Input Series.
        moments_dict: If not `None`, then the mean and standard
            deviation used to compute the z-score transformation is
            saved as entries in `moments_dict` with keys determined by
            the `keys` argument; defaults to `None`.
        keys: Determines the keys saved in `moments_dict`
            if moments are saved; defaults to (`'mean'`, `'std'`).

    Returns:
        Transformed Series.
    """
    mean = s.mean()
    std = s.std()
    if std == 0:
        return 0
    if moments_dict is not None:
        moments_dict[keys[0]] = mean
        moments_dict[keys[1]] = std
    return (s - mean) / std