Getting started

Introduction

The staircase package is used to model step functions. We discuss what a step function is below, but first let’s talk application. Step functions can be used to represent time series - think changes in state over time, queue counts over time, utilisation over time - you get the idea.

The staircase package makes converting raw, temporal data into time series easy and readable. Furthermore there is a rich variety of arithmetic operations, relational operations, logical operations, statistical operations, to enable analysis, in addition to functions for univariate analysis, aggregations and compatibility with pandas.Timestamp.

What is a step function?

A step function, also known as a staircase function, is a piecewise constant function defined over the real numbers. It can be characterised as a function f defined over a sequence of disjoint intervals and where f(x) = f(y) whenever x and y belong to the same interval.

A formal definition:

step function formal definition

The staircase package can be used to model step functions over the real numbers. Below we show two examples of step functions, in the left plot the step function is composed of left-closed right-open intervals, and in the right plot the step function is composed of left-open right-closed intervals.

examples of step functions

Two examples of step functions

To help clarify the characteristics of a step function we show two plots below which do not contain step functions. The chart on the left shows a function which is not piecewise-constant while the chart on the right shows a relation that fails to be a function.

not step functions

Two examples of relations which are not step functions

Currently, the staircase package is not capable of encapsulating all step functions. Specifically, it does not accommodate step functions with degenerate intervals, i.e. intervals of the form \([a,a]\) which contain a single number only, and the domain must be simply connected, i.e. the function is defined for all real numbers. The figure below shows two step functions which cannot currently be modelled with staircase. Both step functions do not have simply connected domains, and the left has degenerate intervals.

non-staircase-compatible step functions

Two examples of step functions which cannot be modelled with the staircase package

More information on step functions can be found on Wikipedia.

A note on interval endpoints

_images/warning.jpg

In general, it is possible for the disjoint intervals comprising a step function to be closed, half-closed or open. However the staircase package does not explicitly model which interval endpoints are open and which are closed - and it does not model the value of the step function at the interval endpoints. In fact, the limitations of the staircase package are perhaps summarised by the following statement:

Let \(z \in \mathbb{R}\) and \(f\) a step function with a simply connected domain. The staircase package does not provide functionality to evaluate \(f(z)\). Instead it can only evaluate \(f(z) = \lim_{x \to z^{-}} f(x)\) or \(f(z) = \lim_{x \to z^{+}} f(x)\).

This is why staircase cannot accommodate step functions with degenerate intervals. The value of \(f(z)\) however can be inferred under certain assumptions. Let \(S\) be the set of step functions with simply connected domains, and containing no degenerate intervals. Furthermore, let \(S_L \subseteq S\) and \(S_R \subseteq S\) be those step functions having only left-closed right-open intervals, and left-open right-closed intervals respectively, then

  • If \(f \in S_L\) then \(f(z) = \lim_{x \to z^{+}} f(x)\)
  • If \(f \in S_R\) then \(f(z) = \lim_{x \to z^{-}} f(x)\)

Note that by definition, any step function containing a degenerate interval cannot belong to \(S_L\) or \(S_R\). The class in the staircase package which provides an abstraction for step functions has a method sample which, depending on a parameter, calculates either \(\lim_{x \to z^{-}} f(x)\) or \(\lim_{x \to z^{+}} f(x)\). The sample method can then be interpreted as calculating \(f(z)\) if \(f \in S_L \cup S_R\).

Futhermore, \((S', op)\), where \(S' \in \{S_L, S_R\}\) and \(op \in \{+, *, -, >, \geq, <, \leq, ==, !=\}\), is a group. In layman’s terms, if we perform an operation \(op\) on two step functions belonging to \(S_L\) then the result also belongs to \(S_L\). The same goes for \(S_R\). This is not true however for \(S\). For example, let \(f(x) = 1\) if and only if \(x \geq 0\), and \(0\) otherwise, and let \(g(x) = 1\) if an only if \(x > 0\), and \(0\) otherwise. In this example \(f \in S_L\) and \(g \in S_R\). The result of \(f + g\) is a step function which does not belong to \(S\) since it contains the degenerate interval \([0,0]\).

In conclusion, it is recommended that users adopt the convention to assume all step functions belong to \(S_L\), or alternatively \(S_R\).

A small example

The number of users viewing this webpage over time can be modelled as a step function. The value of the function increases by 1 every time a user arrives at the page, and decreases by 1 every time a user leaves the page. Let’s say we have this data in vector format (i.e. tuple, list, numpy array, pandas series). Specifically, assume arrive and leave are vectors of times, expressed as minutes past midnight, for all page views occuring yesterday. Creating the corresponding step function is simple. To achieve it we use the Stairs class:

>>> import staircase as sc

>>> views = sc.Stairs()
>>> views.layer(arrive,leave)

We can visualise the function with the plot method:

>>> views.plot()
pageviews example created with plot method

We can find the total time the page was viewed:

>>> views.integrate(0,1440)
9297.94622521079

We can find the average number of viewers:

>>> views.mean(0,1440)
6.4569071008408265

We can find the average number of viewers, per hour of the day, and plot:

>>> pd.Series([views.mean(60*i, 60*(i+1)) for i in range(24)]).plot()
plot of mean page views per hour

We can find the maximum concurrent views:

>>> views.max(0,1440)
16

We can create histogram data showing relative frequency of concurrent viewers (and plot it):

>>> views.hist(0,1440).plot.bar()
histogram plot of concurrent views

Because plotting is based on matplotlib and it requires relatively little effort to take the previous chart and improve the aesthetics:

aesthetic histogram plot of concurrent views

See the case studies for more in-depth demonstrations of the staircase package.

The staircase API

The API Reference contains a detailed description of the staircase API. The reference describes how the methods work and which parameters can be used. It assumes that you have an understanding of the key concepts.