Array methods (and pandas Extension Array)#
There are plenty of binary operations in staircase
. These are ones which operate on two step functions to produce a result. There are occasions however where we want to perform an operation on several step functions. There is typically two cases that this arises:
We want to perform separate operations on multiple step functions, with a common parameter. This could include sampling the collection of step functions with a common set of points, or plotting each step function in a collection to the same
matplotlib.axes.Axes
instance.We want to perform a single operation which acts upon a collection of step functions, such as creating an average step function, or calculating a co-variance matrix.
There are several ways of achieving this in staircase
, all of which ultimately rely on staircase.StairsArray
- a pandas Extension Array defined for holding staircase.Stairs
instances. An ExtensionDtype and a custom Series accessor are also provided to bring the methods defined on staircase.StairsArray
to the domain of pandas.Series
.
A StairsArray can be defined by passing a collection of staircase.Stairs
instances to the constructor.
In [1]: import staircase as sc
In [2]: df = sc.make_test_data(groups=["a", "b", "c"])
In [3]: a = sc.Stairs(df.query("group == 'a'"), "start", "end");
In [4]: b = sc.Stairs(df.query("group == 'b'"), "start", "end");
In [5]: c = sc.Stairs(df.query("group == 'c'"), "start", "end");
In [6]: sc.StairsArray([a,b,c])
Out[6]:
<StairsArray>
[<staircase.Stairs, id=140170722732208>,
<staircase.Stairs, id=140171168754656>,
<staircase.Stairs, id=140170726683840>]
Length: 3, dtype: Stairs
We can alternatively create a pandas Series, with dtype “Stairs” like so
In [7]: series_stairs = (
...: df.groupby("group")
...: .apply(sc.Stairs, start="start", end="end")
...: .astype("Stairs")
...: )
...:
In [8]: series_stairs
Out[8]:
group
a <staircase.Stairs, id=140170726682784>
b <staircase.Stairs, id=140170726682928>
c <staircase.Stairs, id=140171168753024>
dtype: Stairs
When a Series has the dtype “Stairs” (StairsDtype
) there are several methods defined on the Series which defer to the underlying StairsArray
:
StairsArray method |
Series method |
---|---|
In addition to these the standard binary operators (+
, -
, *
, /
, >
, >=
, <
, <=
, ==
, !=
) are also defined for StairsArray and Series with “Stairs” dtype. The second operand may be numerical, a staircase.Stairs
object, or an array-like collection of these.
Applying additional methods defined on StairsArray to a Series requires the use of the accessor, which is automatically registered with pandas
when staircase
is imported.
StairsArray method |
Series accessor method |
---|---|
For example, to sum together all the step functions in the Series we can use
In [9]: series_stairs.sum()
Out[9]: <staircase.Stairs, id=140170722151920>
The result is calculated with the performant method defined on StairsArray, which is faster than the default sum method provided by Series which applies a reduction using the staircase.Stairs.__add__()
method defined on the Stairs class.
Using the staircase Series accessor (StairsAccessor) - which is named sc
is done like so
In [10]: ax = series_stairs.sc.plot()
In [11]: ax.legend();
Note that the underlying StairsArray can be extracted using pandas.Series.values
:
In [12]: series_stairs.values
Out[12]:
<StairsArray>
[<staircase.Stairs, id=140170726682784>,
<staircase.Stairs, id=140170726682928>,
<staircase.Stairs, id=140171168753024>]
Length: 3, dtype: Stairs
The above functionality is also available as top level functions which operate on a variety of collections (containing Stairs objects) such as lists, dictionaries, numpy arrays etc.
Which of the three approaches taken (StairsArray, Series accessor, top level function) is a matter of taste and convenience.