User guide#

High-level overview of relevant concepts. Click a topic on the left for details, or continue reading for a high-level overview. For a summary of all time_split.split()-parameters, see the overview page.

See also

The Examples page.

Types#

A single fold is a 3-tuple of bounds (start, mid, end) (type DatetimeSplitBounds). A list thereof are called ‘splits’ (type DatetimeSplits).

Conventions#

  • The ‘mid’ timestamp is assumed to be the (simulated) training date, and

  • Data is restricted to start <= data.timestamp < mid, and

  • Future data is restricted to mid <= future_data.timestamp < end.

Guarantees#

  • Splits are strictly increasing: For all indices i, splits[i].mid < splits[i+1].mid holds.

  • Timestamps within a fold are strictly increasing: start[i] < mid[i] < end[i].

  • If available data is given and expand_limits=False [1], no part of any fold will lie outside the available range.

  • Later folds are always preferred (see the skip and n_folds-arguments).

Limitations#

  • Data and Future data from different folds may overlap, depending on the split parameters.

  • Date restrictions apply to min(available), max(available). Sparse data may create empty folds.

  • Schedule and Span arguments (before/after) must be strictly positive.

Footnotes