Array methods (and pandas Extension Array)#

There are plenty of binary operations in staircase. These are ones which operate on two step functions to produce a result. There are occasions however where we want to perform an operation on several step functions. There is typically two cases that this arises:

  1. We want to perform separate operations on multiple step functions, with a common parameter. This could include sampling the collection of step functions with a common set of points, or plotting each step function in a collection to the same matplotlib.axes.Axes instance.

  2. We want to perform a single operation which acts upon a collection of step functions, such as creating an average step function, or calculating a co-variance matrix.

There are several ways of achieving this in staircase, all of which ultimately rely on staircase.StairsArray - a pandas Extension Array defined for holding staircase.Stairs instances. An ExtensionDtype and a custom Series accessor are also provided to bring the methods defined on staircase.StairsArray to the domain of pandas.Series.

A StairsArray can be defined by passing a collection of staircase.Stairs instances to the constructor.

In [1]: import staircase as sc

In [2]: df = sc.make_test_data(groups=["a", "b", "c"])

In [3]: a = sc.Stairs(df.query("group == 'a'"), "start", "end");

In [4]: b = sc.Stairs(df.query("group == 'b'"), "start", "end");

In [5]: c = sc.Stairs(df.query("group == 'c'"), "start", "end");

In [6]: sc.StairsArray([a,b,c])
Out[6]: 
<StairsArray>
[<staircase.Stairs, id=140170722732208>,
 <staircase.Stairs, id=140171168754656>,
 <staircase.Stairs, id=140170726683840>]
Length: 3, dtype: Stairs

We can alternatively create a pandas Series, with dtype “Stairs” like so

In [7]: series_stairs = (
   ...:     df.groupby("group")
   ...:     .apply(sc.Stairs, start="start", end="end")
   ...:     .astype("Stairs")
   ...: )
   ...: 

In [8]: series_stairs
Out[8]: 
group
a    <staircase.Stairs, id=140170726682784>
b    <staircase.Stairs, id=140170726682928>
c    <staircase.Stairs, id=140171168753024>
dtype: Stairs

When a Series has the dtype “Stairs” (StairsDtype) there are several methods defined on the Series which defer to the underlying StairsArray:

StairsArray method

Series method

staircase.StairsArray.sum()

pandas.Series.sum()

staircase.StairsArray.mean()

pandas.Series.mean()

staircase.StairsArray.median()

pandas.Series.median()

staircase.StairsArray.min()

pandas.Series.min()

staircase.StairsArray.max()

pandas.Series.max()

staircase.StairsArray.agg()

pandas.Series.agg()

In addition to these the standard binary operators (+, -, *, /, >, >=, <, <=, ==, !=) are also defined for StairsArray and Series with “Stairs” dtype. The second operand may be numerical, a staircase.Stairs object, or an array-like collection of these.

Applying additional methods defined on StairsArray to a Series requires the use of the accessor, which is automatically registered with pandas when staircase is imported.

StairsArray method

Series accessor method

staircase.StairsArray.sample()

StairsAccessor.sample()

staircase.StairsArray.limit()

StairsAccessor.limit()

staircase.StairsArray.logical_or()

StairsAccessor.logical_or()

staircase.StairsArray.logical_and()

StairsAccessor.logical_and()

staircase.StairsArray.corr()

StairsAccessor.corr()

staircase.StairsArray.cov()

StairsAccessor.cov()

staircase.StairsArray.plot()

StairsAccessor.plot()

For example, to sum together all the step functions in the Series we can use

In [9]: series_stairs.sum()
Out[9]: <staircase.Stairs, id=140170722151920>

The result is calculated with the performant method defined on StairsArray, which is faster than the default sum method provided by Series which applies a reduction using the staircase.Stairs.__add__() method defined on the Stairs class.

Using the staircase Series accessor (StairsAccessor) - which is named sc is done like so

In [10]: ax = series_stairs.sc.plot()

In [11]: ax.legend();
../_images/user_guide_accessor_plot.png

Note that the underlying StairsArray can be extracted using pandas.Series.values:

In [12]: series_stairs.values
Out[12]: 
<StairsArray>
[<staircase.Stairs, id=140170726682784>,
 <staircase.Stairs, id=140170726682928>,
 <staircase.Stairs, id=140171168753024>]
Length: 3, dtype: Stairs

The above functionality is also available as top level functions which operate on a variety of collections (containing Stairs objects) such as lists, dictionaries, numpy arrays etc.

Which of the three approaches taken (StairsArray, Series accessor, top level function) is a matter of taste and convenience.