Time Split#

Time-based k-fold validation splits for heterogeneous data.

Experimenting with parameters#

The Time Split application (available here) is designed to help evaluate the effects of different parameters. To start it locally (requires the time-split[app] extra), run

python -m time_split app start

in the terminal. Alternatively, you may run

docker run -p 8501:8501 rsundqvist/time-split

to start the application using Docker Image Size (tag) Docker instead.

See also

Click here to go to the public API of the application.

Use create_explorer_link() to build application URLs with preselected splitting parameters.

Parameter overview#

Overview of parameters used by time_split.split() and plot(). Integrations such as split_pandas() may add or remove parameters, but the base function remains the same unless otherwise stated.

Name

Default

Type

Description

schedule

N/A

Valid Schedule types:

Generates training dates (DatetimeSplitBounds.mid). Examples:

  • ['2019-05-04', '2019-05-11'] | Hand-picked dates.

  • '7d' | every 7 days, aligned to the end of the available data.

  • '0 0 * * MON,FRI' | every Monday and Friday at midnight.

before
after
= ‘7d’
= 1

Valid Span types:

Range before/after schedule timestamps.

The default after=1 stretches the Future data until the next schedule timestamp, simulating models staying in production until a new model takes its place. That is, fold[i].end = fold[i + 1].mid for after=1.

step

= 1

int >= 1

Keep every step:th fold in the schedule. Default (1) =keep all.

n_splits

= 0

int >= 0

Maximum number of folds. Default (0) =keep all.

available

= None

DatetimeIterable

Limits (min, max), or an iterable of datetime-like types that support the built-in min() and max() functions. Binds schedule to a range.

expand_limits

= ‘auto’

Valid ExpandLimits types:

  • Literal ‘auto’ [2]

  • bool

  • ‘round_to[<tolerance]’

Expand available data outward to its likely “true” limits. Disabled if False, True == 'auto'. The tolerance argument is optional; expand_limits='d' performs regular floor(min) / ceil(max) rounding of the limits. Use expand_limits() to experiment.

Example: Passing expand_limits='d<3h' expands (min, max) -limits (derived from available) to the nearest day, at most 3 hours from the original limit.

filter

= None

Filter or str.

A callable (start, mid, end) -> bool. Strings are converted using get_by_full_name().

Later folds are always [3] preferred. For more information about the schedule, before/after and expand_limits-arguments, see the User guide. See the Examples page for plots using the various parameter options.

Footnotes

Shortcuts#

Click an image below to get started, or use the top navigation bar.