time_split.support#

Supporting functions.

These functions are used internally, but are exposed here as well so that user may create their own logic using the internal logic, or just to test things out.

Warning

Not part of the stable API.

This module may change without notice. Stick to the top-level time_split-module, or lock down your dependencies if you need to use the support module.

Functions

create_explorer_link(host[, data, ...])

Create a Time Fold Explorer application URL.

default_metrics_formatter(end_message, metrics)

Default formatting implementation.

expand_limits(limits[, spec])

Derive the "real" bounds of limits.

fold_weight(splits, *[, unit, available])

Compute fold weights.

format_expanded_limits(original, *[, expanded])

Format expanded limits.

process_available(available, *, expand_limits)

Process a user-given available argument.

to_string()

Pretty-print a fold.

Classes

DatetimeIndexSplitter(schedule[, before, ...])

Backend interface for splitting user data.

class DatetimeIndexSplitter(schedule: DatetimeIndex | Iterable[str | Timestamp | datetime | date | datetime64] | str | Timedelta | timedelta | timedelta64, before: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = '7d', after: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = 1, step: int = 1, n_splits: int = 0, expand_limits: bool | Literal['auto'] | str = 'auto', ignore_filters: bool = False)[source]#

Bases: object

Backend interface for splitting user data. See the Parameter overview page.

schedule: DatetimeIndex | Iterable[str | Timestamp | datetime | date | datetime64] | str | Timedelta | timedelta | timedelta64#
before: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = '7d'#
after: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = 1#
step: int = 1#
n_splits: int = 0#
expand_limits: bool | Literal['auto'] | str = 'auto'#
ignore_filters: bool = False#
get_splits(available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) list[DatetimeSplitBounds][source]#

Compute a split of given user data.

get_plot_data(available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) tuple[list[DatetimeSplitBounds], MaterializedSchedule][source]#

Returns additional data needed to visualize folds.

as_dict() DatetimeIndexSplitterKwargs[source]#

Returns the splitter as a dict.

expand_limits(limits: tuple[Timestamp, Timestamp], spec: bool | Literal['auto'] | str | tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64] | Iterable[tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64]] = 'auto') tuple[Timestamp, Timestamp][source]#

Derive the “real” bounds of limits.

Parameters:
  • limits – A tuple (min, max) of timestamps.

  • spec – Expansion spec as described in the User guide. Also supports level-tuples [(start_at, round_to, tolerance)...]. Passing expand_limits=[settings.auto_expand_limits.day, settings.auto_expand_limits.hour] is equivalent to expand_limits='auto'.

Returns:

Limits rounded according to the given specification.

Raises:

ValueError – For invalid limits.

Examples

>>> from pandas import Timestamp
>>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 22:05:30")

Basic usage.

>>> expand_limits(limits, "d")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-12 00:00:00'))

You may specify a maximum “distance” that limits may be expanded.

>>> expand_limits(limits, "d<1h")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 22:05:30'))

Limits will never be rounded in the “wrong” direction…

>>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 11:05:30")
>>> expand_limits(limits, "d")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))

…even if you make the tolerance large enough.

>>> expand_limits(limits, "d<14h")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))
process_available(available: Iterable[str | Timestamp | datetime | date | datetime64], *, expand_limits: bool | Literal['auto'] | str) ProcessAvailableResult[source]#

Process a user-given available argument.

Parameters:
  • available – Available data from user. May be None

  • expand_limits – Expansion spec as described in the User guide. Determines how much (if at all) to expand limits.

Returns:

A tuple (available, limits). Note that available will be None, it has not been iterated over. This assures that iterables are not consumed unless needed.

Raises:

ValueError – For invalid available arguments.

default_metrics_formatter(end_message: str, metrics: dict[Any, Any] | Series | DataFrame | str | Any) str[source]#

Default formatting implementation.

Format using an appropriate pandas to_string()-method if metrics is a dict or a pandas type. Nested dictionaries are flattened using flatten_dict() if metrics is a dict-of-dicts.

Metrics of type str are assumed to be preformatted, and are appended to end_message as-is.

If any other types are given, fall back to f"{end_message} Metrics: {metrics}".

Examples

Formatting a nested dict.

>>> metrics = {"rmse": {"train": 0.11, "test": 0.5, "future": 20.19}}
>>> print(default_metrics_formatter("End message.", metrics))
End message. Fold metrics:
      train  test  future
rmse   0.11   0.5   20.19

Formatting a pandas.DataFrame.

>>> metrics = {"me": [0.1, 0.2, 0.3], "rmse": [0.11, 0.5, 20.19]}
>>> df = pd.DataFrame(metrics, index=["train", "test", "future"])
>>> print(default_metrics_formatter("End message.", df))
End message. Fold metrics:
         me   rmse
train   0.1   0.11
test    0.2   0.50
future  0.3  20.19

The index printed unless` it is a pandas.RangeIndex.

fold_weight(splits: list[DatetimeSplitBounds], *, unit: str | Literal['rows', 'hours', 'days'] = 'hours', available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) list[DatetimeSplitCounts][source]#

Compute fold weights.

Parameters:
  • splits – List of DatetimeSplitBounds.

  • unit – Time unit of the returned count, or ‘rows’ (requires available data).

  • available – Available data. Required when unit='rows'.

Returns:

A list of tuples [(n_data_units, n_future_data_units), ...].

Raises:

ValueError – if unit='rows' and available=None.

to_string(bounds: str | Timestamp | datetime | date | datetime64 | DatetimeSplitBounds | tuple[str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64], mid: str | Timestamp | datetime | date | datetime64 | None = None, end: str | Timestamp | datetime | date | datetime64 | None = None, /, *, format: str | None = None) str[source]#

Pretty-print a fold.

Sample output.#
('2021-12-30' <= [schedule: '2022-01-04' (Tuesday)] < '2022-01-04 18:00:00')
Parameters:
  • bounds – A fold tuple (start, mid, end), or just start (followed by mid and end).

  • mid – Datetime-like. Must be None when bounds is a tuple.

  • end – Datetime-like. Must be None when bounds is a tuple.

  • format – A custom format to use. Use FOLD_FORMAT if None, but note that only the start, mid and end keys are available to this function.

Returns:

Formatted bounds string.

Raises:

TypeError – If an incorrect number of timestamps are given.

format_expanded_limits(original: tuple[Timestamp, Timestamp], *, expanded: tuple[Timestamp, Timestamp] | None = None, expand_limits: bool | Literal['auto'] | str) str[source]#

Format expanded limits.

Parameters:
  • original – The original data limits.

  • expanded – Expanded data limits. Derived based on original and expanded_limits if None.

  • expand_limits – Limits expansion spec.

Returns:

A string.

Create a Time Fold Explorer application URL.

Parameters:
  • host – Base address where the application is hosted.

  • data – Binds schedule to a range. Regular available arguments (as passed to e.g. time_split.split()) are encoded as a date range to generate dummy data for. Pass a str to use dataset bundled by the server instead. Note that this function cannot verify the kwargs if available is a str dataset.

  • available – Alias of data.

  • skip_default – If True, do not include default split params in the link.

  • show_removed – If True, splits removed by n_splits or step are included in the figure.

  • kwargs – Keyword arguments for the time_split.split()-function.

Returns:

An encoded URL.

Examples

Getting the URL for a local host.

>>> create_explorer_link(
...     host="http://localhost:8501",
...     available=("2019-04-11 00:35:00", "2019-05-11 21:30:00"),
...     schedule="0 0 * * MON,FRI",
... )
'http://localhost:8501?data=1554942900-1557610200&schedule=0+0+%2A+%2A+MON%2CFRI&show_removed=True'

To start the application using locally using Docker Image Size (tag) Docker, run

docker run -p 8501:8501 rsundqvist/time-split

in the terminal.

Modules

types

Internal types.