Array methods (and pandas Extension Array)#
There are plenty of binary operations in staircase. These are ones which operate on two step functions to produce a result. There are occasions however where we want to perform an operation on several step functions. There is typically two cases that this arises:
We want to perform separate operations on multiple step functions, with a common parameter. This could include sampling the collection of step functions with a common set of points, or plotting each step function in a collection to the same
matplotlib.axes.Axesinstance.We want to perform a single operation which acts upon a collection of step functions, such as creating an average step function, or calculating a co-variance matrix.
There are several ways of achieving this in staircase, all of which ultimately rely on staircase.StairsArray - a pandas Extension Array defined for holding staircase.Stairs instances. An ExtensionDtype and a custom Series accessor are also provided to bring the methods defined on staircase.StairsArray to the domain of pandas.Series.
A StairsArray can be defined by passing a collection of staircase.Stairs instances to the constructor.
In [1]: import staircase as sc
In [2]: df = sc.make_test_data(groups=["a", "b", "c"])
In [3]: a = sc.Stairs(df.query("group == 'a'"), "start", "end");
In [4]: b = sc.Stairs(df.query("group == 'b'"), "start", "end");
In [5]: c = sc.Stairs(df.query("group == 'c'"), "start", "end");
In [6]: sc.StairsArray([a,b,c])
Out[6]:
<StairsArray>
[<staircase.Stairs, id=139813175486928>,
<staircase.Stairs, id=139813176477584>,
<staircase.Stairs, id=139813637427856>]
Length: 3, dtype: Stairs
We can alternatively create a pandas Series, with dtype “Stairs” like so
In [7]: series_stairs = (
...: df.groupby("group")[["start", "end"]]
...: .apply(sc.Stairs, start="start", end="end")
...: .astype("Stairs")
...: )
...:
In [8]: series_stairs
Out[8]:
group
a <staircase.Stairs, id=139813173513232>
b <staircase.Stairs, id=139813178308496>
c <staircase.Stairs, id=139813173074896>
dtype: Stairs
When a Series has the dtype “Stairs” (StairsDtype) there are several methods defined on the Series which defer to the underlying StairsArray:
StairsArray method |
Series method |
|---|---|
In addition to these the standard binary operators (+, -, *, /, >, >=, <, <=, ==, !=) are also defined for StairsArray and Series with “Stairs” dtype. The second operand may be numerical, a staircase.Stairs object, or an array-like collection of these.
Applying additional methods defined on StairsArray to a Series requires the use of the accessor, which is automatically registered with pandas when staircase is imported.
StairsArray method |
Series accessor method |
|---|---|
For example, to sum together all the step functions in the Series we can use
In [9]: series_stairs.sum()
Out[9]: <staircase.Stairs, id=139813174588304>
The result is calculated with the performant method defined on StairsArray, which is faster than the default sum method provided by Series which applies a reduction using the staircase.Stairs.__add__() method defined on the Stairs class.
Using the staircase Series accessor (StairsAccessor) - which is named sc is done like so
In [10]: ax = series_stairs.sc.plot()
In [11]: ax.legend();
Note that the underlying StairsArray can be extracted using pandas.Series.values:
In [12]: series_stairs.values
Out[12]:
<StairsArray>
[<staircase.Stairs, id=139813173513232>,
<staircase.Stairs, id=139813178308496>,
<staircase.Stairs, id=139813173074896>]
Length: 3, dtype: Stairs
The above functionality is also available as top level functions which operate on a variety of collections (containing Stairs objects) such as lists, dictionaries, numpy arrays etc.
Which of the three approaches taken (StairsArray, Series accessor, top level function) is a matter of taste and convenience.