Array methods (and pandas Extension Array)#

There are plenty of binary operations in staircase. These are ones which operate on two step functions to produce a result. There are occasions however where we want to perform an operation on several step functions. There is typically two cases that this arises:

  1. We want to perform separate operations on multiple step functions, with a common parameter. This could include sampling the collection of step functions with a common set of points, or plotting each step function in a collection to the same matplotlib.axes.Axes instance.

  2. We want to perform a single operation which acts upon a collection of step functions, such as creating an average step function, or calculating a co-variance matrix.

There are several ways of achieving this in staircase, all of which ultimately rely on staircase.StairsArray - a pandas Extension Array defined for holding staircase.Stairs instances. An ExtensionDtype and a custom Series accessor are also provided to bring the methods defined on staircase.StairsArray to the domain of pandas.Series.

A StairsArray can be defined by passing a collection of staircase.Stairs instances to the constructor.

In [1]: import staircase as sc

In [2]: df = sc.make_test_data(groups=["a", "b", "c"])

In [3]: a = sc.Stairs(df.query("group == 'a'"), "start", "end");

In [4]: b = sc.Stairs(df.query("group == 'b'"), "start", "end");

In [5]: c = sc.Stairs(df.query("group == 'c'"), "start", "end");

In [6]: sc.StairsArray([a,b,c])
Out[6]: 
<StairsArray>
[<staircase.Stairs, id=140385501379984>,
 <staircase.Stairs, id=140385902684496>,
 <staircase.Stairs, id=140385515884496>]
Length: 3, dtype: Stairs

We can alternatively create a pandas Series, with dtype “Stairs” like so

In [7]: series_stairs = (
   ...:     df.groupby("group")[["start", "end"]]
   ...:     .apply(sc.Stairs, start="start", end="end")
   ...:     .astype("Stairs")
   ...: )
   ...: 

In [8]: series_stairs
Out[8]: 
group
a    <staircase.Stairs, id=140385506488080>
b    <staircase.Stairs, id=140385507156048>
c    <staircase.Stairs, id=140385507159824>
dtype: Stairs

When a Series has the dtype “Stairs” (StairsDtype) there are several methods defined on the Series which defer to the underlying StairsArray:

In addition to these the standard binary operators (+, -, *, /, >, >=, <, <=, ==, !=) are also defined for StairsArray and Series with “Stairs” dtype. The second operand may be numerical, a staircase.Stairs object, or an array-like collection of these.

Applying additional methods defined on StairsArray to a Series requires the use of the accessor, which is automatically registered with pandas when staircase is imported.

For example, to sum together all the step functions in the Series we can use

In [9]: series_stairs.sum()
Out[9]: <staircase.Stairs, id=140385507559120>

The result is calculated with the performant method defined on StairsArray, which is faster than the default sum method provided by Series which applies a reduction using the staircase.Stairs.__add__() method defined on the Stairs class.

Using the staircase Series accessor (StairsAccessor) - which is named sc is done like so

In [10]: ax = series_stairs.sc.plot()

In [11]: ax.legend();
../_images/user_guide_accessor_plot.png

Note that the underlying StairsArray can be extracted using pandas.Series.values:

In [12]: series_stairs.values
Out[12]: 
<StairsArray>
[<staircase.Stairs, id=140385506488080>,
 <staircase.Stairs, id=140385507156048>,
 <staircase.Stairs, id=140385507159824>]
Length: 3, dtype: Stairs

The above functionality is also available as top level functions which operate on a variety of collections (containing Stairs objects) such as lists, dictionaries, numpy arrays etc.

Which of the three approaches taken (StairsArray, Series accessor, top level function) is a matter of taste and convenience.