What’s new in version 2?

Version 2 of staircase was released in September 2021. Whereas version 1 was based upon Sorted Containers, version 2 is based upon pandas and numpy. Making this move required a complete rewrite of staircase internals but yielded significant speedups.

Major enhancements have also been added, namely masking, slicing and the implicit handling of dates. While much of the API has remained the same, or similar, there are many backwards incompatible changes and deprecations, which are detailed below.

In the following, v1 is used to refer to staircase v1.*, and v2 is used to refer to staircase v2.*.

Speed comparison

Enhancements

Optional layer arguments in constructor

staircase.Stairs.__init__() now includes those parameters found in staircase.Stairs.layer(). This was done to simplify the two-step process of initialising a Stairs instance, and then layering, into one.

Constructor signature includes closed parameter

In v1, users were asked to assume a convention of left-closed or right-closed half open intervals, comprising their step functions. This assumption would be used when calling staircase.Stairs.sample(). To better align to the Zen of Python, in particular “explicit is better than implicit”, users of v2 are required to declare this assumption on construction of a Stairs instance via the closed parameter. The possible values for this parameter are “left” (default) or “right”.

staircase.Stairs.layer() now accepts a dataframe argument

In v1, the layer method had three parameters: start, end, value. These parameters remain in V2, in the same positions, however an optional frame parameter, which accepts a pandas.DataFrame has been added. Inspired by conventions found in seaborn, if frame is specified then start, end and value arguments may optionally be strings, corresponding to column names in the dataframe.

Masking

Masking is introduced in v2. It allows step functions to have intervals where they are undefined. Areas where a step function is undefined are ignored when calculating statistics, and will be absent from plots. Masking, and related functionality, is facilitated by the following methods:

Please see Masking in the user guide for more information.

Slicing

Slicing is introduced in v2, and is a similar in concept to groupby operations in pandas. This functionality allows users to slice a step function into discrete intervals, apply a function, and combine the results. This can be used to convert from a step function to time series data. A new class, staircase.StairsSlicer, has been introduced to facilitate the functionality, however it is envisioned that users will primarily use staircase.Stairs.slice() rather than creating StairsSlicer instances directly.

Please see Slicing in the user guide for more information.

Extended plotting options

Several parameters have been added to staircase.Stairs.plot() in v2. These include style and arrows. The effect is demonstrated below:

In [1]: sf = sc.Stairs().layer([1,4,2], [3,6,5], [1,1,2])

In [2]: fig, axes = plt.subplots(ncols=2, figsize=(7,3), sharey=True, tight_layout=True)

In [3]: sf.plot(ax=axes[0], arrows=False, style="step");

In [4]: axes[0].set_title('arrows=False, style="step"');

In [5]: sf.plot(ax=axes[1], arrows=True, style="hlines");

In [6]: axes[1].set_title('arrows=True, style="hlines"');
../_images/version_two_plotting.png

Please see the plotting intro tutorial for more information.

Reverse operators

Binary operations in staircase have always allowed one operand to be numerical, provided the first operand was a staircase.Stairs instance. Reverse operators have been added in v2 which allow the first operand to be numerical. These operators include

Like their standard counterparts, these operators are best used with their corresponding symbols for readability, eg:

sf = sc.Stairs().layer([1,4,2], [3,6,5], [1,1,2])
sf + 3  # staircase.Stairs.add
3 + sf  # staircase.Stairs.radd

Support for numpy.datetime64 and datetime.datetime

Datetime domains in v1 were possible via pandas.Timestamp. In v2 this has been extended to include numpy.datetime64, datetime.datetime. Note however that numpy does not support time-zone aware variants. “Under the hood” of staircase datetimes are represented by pandas.Timestamp, even if the original data was another datetime class. This conversion is something that is inherited from pandas. If you wish to convert from pandas.Timestamp to another datetime class then the following methods may be of use

Please see Working with dates in the user guide for more information.

staircase.Stairs.hist() now has stat parameter

In v2, the stat parameter is introduced to define the statistic used for computing the value of each bin in the histogram. The possibilities, inspired by seaborn.histplot() include

  • sum the magnitude of observations

  • frequency values of the histogram are divided by the corresponding bin width

  • density normalises values of the histogram so that the area is 1

  • probability normalises values so that the histogram values sum to 1

Please see Distribution of values in the user guide for more information.

Other additions

Backwards incompatible API changes

Optional dataframe first parameter in constructor

In v1, the first parameter of the constructor was value (renamed to initial_value in v2). In v2 the first parameter is now a pandas.DataFrame. This decision was made to facilitate construction of staircase.Stairs instances using pandas.DataFrame.pipe() or “groupby-apply”:

In [7]: df = sc.make_test_data(groups=("a", "b", "c"));

In [8]: df.head()
Out[8]: 
  start                 end  value group
0   NaT 2021-01-02 16:41:00      1     a
1   NaT 2021-01-02 17:05:00      4     a
2   NaT 2021-01-03 08:07:00      2     a
3   NaT 2021-01-04 19:03:00      1     a
4   NaT 2021-01-08 04:46:00      9     a

In [9]: df.query("start > '2021-5'").pipe(sc.Stairs, "start", "end")
Out[9]: <staircase.Stairs, id=139845338345968>

In [10]: df.groupby("group").apply(sc.Stairs, "start", "end")
Out[10]: 
group
a    <staircase.Stairs, id=139845338000304>
b    <staircase.Stairs, id=139845337997472>
c    <staircase.Stairs, id=139845338000784>
dtype: object

staircase.Stairs.clip() set values to null, not zero

In v1 the clip function, given an interval in the domain defined by lower and upper parameters, would return a copy of the Stairs instance where the value of the step function outside of this interval is zero. In v2 these values are undefined instead.

Please see Masking in the user guide for more information.

staircase.Stairs.sample() now takes only one parameter

In v1, staircase.Stairs.sample() was capable of performing multiple tasks. In the interests of the single-responsibility principle the existing functionality has been delegated to new methods:

staircase.Stairs.step_changes is now a property and returns pandas.Series

In v1, the return type was a dictionary, and was a consequence of staircase being built upon sortedcontainers.SortedDict. The return type in v2 is a pandas.Series and reflects is a consequence of being built upon pandas. It is now a property, rather than a method.

Other changes

Deprecations

value parameter in constructor renamed to initial_value

In v2, the parameters of staircase.Stairs.layer() were added to the Stairs constructor. This resulted in a name clash, which was resolved by renaming the original value parameter in the constructor to initial_value.

use_dates and tz parameters removed from constructor

In v1, datetime domains were facilitated with conversions between pandas.Timestamp and the real numbers. The use of a datetime domain needed to be defined in the constructor, so that the Stairs instance could be instructed to make, or not make, conversions when needed. In v2, there are no such conversions, and the values in the Stairs internal data structures can remain as pandas.Timestamp.

Domain parameters removed from statistical functions

In v1, there are several statistic methods which take lower and upper parameters to restrict a calculation to an interval. In v2 these parameters have been removed in favour of using the method on a “clipped” step function. Affected methods include:

staircase.Stairs.integral_and_mean removed

Integral and mean calculations for functions are related, and the latter relies on the former. To avoid duplicate calculations v1 provided this method. In v2 both results are calculated, when either staircase.Stairs.integral() or staircase.Stairs.mean() is called, and cached on the Stairs instance to avoid duplicate calculation.

staircase.Stairs.percentile_stairs removed

In v1, staircase.Stairs.percentile_stairs returned an instance of staircase.Stairs, and staircase.Stairs.percentile() could be used for evaluating percentile values. In v2, staircase.Stairs.percentile combines both of these functions. It is an accessor (think if it like a property) that returns an instance of a Percentiles class (a subclass of Stairs). The Percentiles class is callable, providing the ability to evaluate percentile values.

staircase.Stairs.ecdf_stairs removed

Similar to the case for staircase.Stairs.percentile_stairs above, the existing functionality is now provided by staircase.Stairs.ecdf, an accessor which returns an ECDF class (a subclass of Stairs). The ECDF class is callable.

staircase.Stairs.resample removed

Resampling is now achieved through slicing. Please see Slicing for more information.

Other deprecations