Array methods (and pandas Extension Array)¶
There are plenty of binary operations in
staircase. These are ones which operate on two step functions to produce a result. There are occasions however where we want to perform an operation on several step functions. There is typically two cases that this arises:
We want to perform separate operations on multiple step functions, with a common parameter. This could include sampling the collection of step functions with a common set of points, or plotting each step function in a collection to the same
We want to perform a single operation which acts upon a collection of step functions, such as creating an average step function, or calculating a co-variance matrix.
There are several ways of achieving this in
staircase, all of which ultimately rely on
staircase.StairsArray - a pandas Extension Array defined for holding
staircase.Stairs instances. An ExtensionDtype and a custom Series accessor are also provided to bring the methods defined on
staircase.StairsArray to the domain of
A StairsArray can be defined by passing a collection of
staircase.Stairs instances to the constructor.
In : import staircase as sc In : df = sc.make_test_data(groups=["a", "b", "c"]) In : a = sc.Stairs(df.query("group == 'a'"), "start", "end"); In : b = sc.Stairs(df.query("group == 'b'"), "start", "end"); In : c = sc.Stairs(df.query("group == 'c'"), "start", "end"); In : sc.StairsArray([a,b,c]) Out: <StairsArray> [<staircase.Stairs, id=140217124381120>, <staircase.Stairs, id=140217123166576>, <staircase.Stairs, id=140217572985872>] Length: 3, dtype: Stairs
We can alternatively create a pandas Series, with dtype “Stairs” like so
In : series_stairs = ( ...: df.groupby("group") ...: .apply(sc.Stairs, start="start", end="end") ...: .astype("Stairs") ...: ) ...: In : series_stairs Out: group a <staircase.Stairs, id=140217561124192> b <staircase.Stairs, id=140217562851216> c <staircase.Stairs, id=140217561123568> dtype: Stairs
When a Series has the dtype “Stairs” (
StairsDtype) there are several methods defined on the Series which defer to the underlying
In addition to these the standard binary operators (
!=) are also defined for StairsArray and Series with “Stairs” dtype. The second operand may be numerical, a
staircase.Stairs object, or an array-like collection of these.
Series accessor method
For example, to sum together all the step functions in the Series we can use
In : series_stairs.sum() Out: <staircase.Stairs, id=140217572532176>
The result is calculated with the performant method defined on StairsArray, which is faster than the default sum method provided by Series which applies a reduction using the
staircase.Stairs.__add__() method defined on the Stairs class.
Using the staircase Series accessor (StairsAccessor) - which is named
sc is done like so
In : ax = series_stairs.sc.plot() In : ax.legend();
Note that the underlying StairsArray can be extracted using
In : series_stairs.values Out: <StairsArray> [<staircase.Stairs, id=140217561124192>, <staircase.Stairs, id=140217562851216>, <staircase.Stairs, id=140217561123568>] Length: 3, dtype: Stairs
The above functionality is also available as top level functions which operate on a variety of collections (containing Stairs objects) such as lists, dictionaries, numpy arrays etc.
Which of the three approaches taken (StairsArray, Series accessor, top level function) is a matter of taste and convenience.