Slicing

Slicing is a new feature introduces in v2. It has some similarities to pandas.cut(), which is used to bin data into discrete intervals, and pandas.Series.groupby(), which involves splitting the data, applying a function and combining the results.

The method staircase.Stairs.slice() is designed to slice a step function into discrete intervals, apply a function, and combine the results. In most cases the result will be a pandas.Series but there are slice functions which return something different. The slice method has a parameter cut which can be a sequence, pandas.IntervalIndex, or pandas.PeriodIndex - the latter is only applicable for datetime domains. The cut parameter is almost identical to the bin parameter in pandas.cut(). It is used to provide the interval bounds which are used to slice the step function.

Examples:

In [1]: import staircase as sc

In [2]: import pandas as pd

In [3]: df = sc.make_test_data(seed=42)

In [4]: sf = sc.Stairs(df, "start", "end")

In [5]: sf.plot()
Out[5]: <AxesSubplot:>
../_images/slicing1.png
In [6]: sf_sliced = sf.slice(pd.period_range("2021", "2022"))

In [7]: sf_sliced.mean()
Out[7]: 
[2021-01-01, 2021-01-02)    24.425000
[2021-01-02, 2021-01-03)    24.000000
[2021-01-03, 2021-01-04)    24.821528
[2021-01-04, 2021-01-05)    24.507639
[2021-01-05, 2021-01-06)    25.000000
                              ...    
[2021-12-28, 2021-12-29)    15.427083
[2021-12-29, 2021-12-30)    14.314583
[2021-12-30, 2021-12-31)    13.548611
[2021-12-31, 2022-01-01)    13.000000
[2022-01-01, 2022-01-02)    13.000000
Length: 366, dtype: float64

In [8]: sf_sliced.median()
Out[8]: 
[2021-01-01, 2021-01-02)    24.0
[2021-01-02, 2021-01-03)    24.0
[2021-01-03, 2021-01-04)    25.0
[2021-01-04, 2021-01-05)    25.0
[2021-01-05, 2021-01-06)    25.0
                            ... 
[2021-12-28, 2021-12-29)    15.0
[2021-12-29, 2021-12-30)    14.0
[2021-12-30, 2021-12-31)    14.0
[2021-12-31, 2022-01-01)    13.0
[2022-01-01, 2022-01-02)    13.0
Length: 366, dtype: float64

In the above example sf_sliced is a staircase.StairSlicer object. This object exposes many intuitive methods which can be performed on the “slices”. If several methods are to be performed then it may be wise to assign the StairSlicer object to a variable. This is not necessary though, as demonstrated in the below example, and the staircase.StairSlicer.agg() method can be used to perform multiple statistical operations in one method call.

In [9]: df = sc.make_test_data(dates=False, seed=42)

In [10]: sf = sc.Stairs(df, "start", "end")

In [11]: sf.plot()
Out[11]: <AxesSubplot:>
../_images/slicing2.png
In [12]: ii = pd.IntervalIndex.from_breaks(range(0, 101, 5))

In [13]: sf.slice(ii).agg(["min", "max"])
Out[13]: 
            min   max
(0, 5]     22.0  25.0
(5, 10]    15.0  22.0
(10, 15]   14.0  18.0
(15, 20]   14.0  17.0
(20, 25]   12.0  17.0
(25, 30]   14.0  18.0
(30, 35]   11.0  16.0
(35, 40]   11.0  15.0
(40, 45]   13.0  18.0
(45, 50]   14.0  18.0
(50, 55]   14.0  24.0
(55, 60]   17.0  21.0
(60, 65]   18.0  22.0
(65, 70]   10.0  18.0
(70, 75]   10.0  15.0
(75, 80]   13.0  20.0
(80, 85]   20.0  24.0
(85, 90]   13.0  24.0
(90, 95]   12.0  18.0
(95, 100]  13.0  17.0

A major point of difference in the comparison between staircase.Stairs.slice() and pandas.Series.groupby(), is that the intervals used to slice a step function may overlap, nor they need to cover the domain. This is demonstrated in the following trivial examples:

In [14]: ii = pd.IntervalIndex.from_arrays([0]*5, [100]*5)

In [15]: sf.slice(ii).mode()
Out[15]: 
(0, 100]    15.0
(0, 100]    15.0
(0, 100]    15.0
(0, 100]    15.0
(0, 100]    15.0
dtype: float64
In [16]: ii = pd.IntervalIndex.from_tuples([(0,10),  (40,50)])

In [17]: sf.slice(ii).integral()
Out[17]: 
(0, 10]     206.892393
(40, 50]    156.092672
dtype: float64

There are several methods, beyond simple summary stats, that staircase.StairSlicer provides. This includes staircase.StairSlicer.apply() which functions similarly to pandas.Series.apply() and allows any function, which takes a Stairs object as its first argument to be applied to the slices:

In [18]: def count_steps(s):
   ....:     return s.number_of_steps
   ....: 

In [19]: ii = pd.IntervalIndex.from_breaks(range(0, 101, 5))

In [20]: sf.slice(ii).apply(count_steps)
Out[20]: 
(0, 5]       19
(5, 10]      23
(10, 15]     16
(15, 20]     17
(20, 25]     11
(25, 30]     15
(30, 35]     15
(35, 40]     13
(40, 45]     20
(45, 50]     15
(50, 55]     18
(55, 60]     19
(60, 65]     14
(65, 70]     22
(70, 75]     11
(75, 80]     21
(80, 85]     17
(85, 90]     17
(90, 95]     22
(95, 100]    13
dtype: int64

The concept of resampling a step function was introduced in staircase v1. In v2 resampling is achieved by slicing, applying a function which returns a number, then producing a new step function by replacing the slice intervals with those values (see staircase.StairSlicer.resample())

In [21]: fig, axes = plt.subplots(ncols=2, figsize=(7,3), sharex=True, sharey=True)

In [22]: sf.plot(axes[0]);

In [23]: axes[0].set_title("sf");

In [24]: ii = pd.IntervalIndex.from_breaks(range(0, 101, 10))

In [25]: sf.slice(ii).resample("mean").plot(axes[1]);

In [26]: axes[1].set_title("sf - resampled");
../_images/slicing_resample.png