Working with dates#
staircase
is designed with two domain types in mind: real numbers and time - or in the language of Python computing, floats and datetimes. Even when the domain is real numbers it is quite likely that, semantically, those numbers represent a time value.
In version 1 of staircase
a user wanting to use datetimes needed to declare this when creating a staircase.Stairs
instance. This is not required in v2:
In [1]: import pandas as pd
In [2]: import staircase as sc
In [3]: sf = sc.Stairs().layer(pd.Timestamp("2021"), pd.Timestamp("2022"), 3)
In [4]: sf.plot(arrows=True)
Out[4]: <AxesSubplot: >
For datetime domains v2 has been tested with numpy.datetime64
, datetime.datetime
, and pandas.Timestamp
. This includes time-zone aware variants for the latter two of these. If you’re working with datetime domains then arguments to methods with domain-based parameters need to be datetime too. These methods include:
String representations of timestamps can also be used as arguments for staircase.Stairs.clip()
, staircase.Stairs.mask()
and staircase.Stairs.where()
.
Note that when using datetime domains an integral calculation will be a timedelta:
In [5]: sf.integral()
Out[5]: Timedelta('1095 days 00:00:00')
Unfortunately the pandas.Timedelta
class has limitations which may be exceeded with integral calculations (resulting in an overflow error). A workaround may involve scaling your step function values down, by dividing by a constant, before calculating the integral. This begs the question of whether this situation can be avoided by using numpy.datetime64
or datetime.datetime
. “Under the hood” of staircase
datetimes are represented by pandas.Timestamp
, even if the original data was another datetime class. This conversion is something that is inherited from pandas
and the overflow error remains. If you wish to convert from pandas.Timestamp
to another datetime class then the following methods may be of use:
Timezones#
Datetime data can be timezone-naíve or timezone-aware. For many datetime applications of staircase it may suffice to ignore the concept of timezones and work with timezone-naíve data - an attractive option as working with timezones, and converting between them, can be tricky. However many countries observe Daylight Savings Time which results in one day of the year having 23 hours, and another having 25 hours:
In [6]: import pytz
In [7]: timezone = pytz.timezone('Australia/Sydney')
In [8]: sc.Stairs().layer(
...: pd.Timestamp("2021-4-4", tz=timezone),
...: pd.Timestamp("2021-4-5", tz=timezone),
...: ).integral()
...:
Out[8]: Timedelta('1 days 01:00:00')
In [9]: sc.Stairs().layer(
...: pd.Timestamp("2021-10-3", tz=timezone),
...: pd.Timestamp("2021-10-4", tz=timezone),
...: ).integral()
...:
Out[9]: Timedelta('0 days 23:00:00')
If you are computing some daily metric and do not take this into account then the calculations on those days will be incorrect, however the consequences, and indeed the calculated result, maybe small enough to ignore. However, for some applications the use of timezone-aware timestamps may be critical.
Given the sheer number of packages available for Python it may be of no surprise that there are several for dealing with timezones. There is one which is clearly the de facto standard: pytz
, however staircase supports any timezone package that pandas supports.