time_split.support#

Supporting functions.

These functions are used internally, but are exposed here as well so that user may create their own logic using the internal logic, or just to test things out.

Warning

Not part of the stable API.

This module may change without notice. Stick to the top-level time_split-module, or lock down your dependencies if you need to use the support module.

Functions

default_metrics_formatter(end_message, metrics)

Default formatting implementation.

expand_limits(limits[, spec])

Derive the "real" bounds of limits.

fold_weight(splits, *[, unit, available])

Compute fold weights.

format_expanded_limits(original, *[, ...])

Format expanded limits.

process_available(available, *, expand_limits)

Process a user-given available argument.

to_string()

Pretty-print a fold.

Classes

DatetimeIndexSplitter(schedule[, before, ...])

Backend interface for splitting user data.

class DatetimeIndexSplitter(schedule: DatetimeIndex | Iterable[str | Timestamp | datetime | date | datetime64] | str | Timedelta | timedelta | timedelta64, before: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = '7d', after: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = 1, step: int = 1, n_splits: int = 0, expand_limits: bool | Literal['auto'] | str = 'auto', ignore_filters: bool = False, filter: Callable[[Timestamp, Timestamp, Timestamp], bool] | str | None = None)[source]#

Bases: object

Backend interface for splitting user data. See the Parameter overview page.

schedule: DatetimeIndex | Iterable[str | Timestamp | datetime | date | datetime64] | str | Timedelta | timedelta | timedelta64#
before: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = '7d'#
after: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = 1#
step: int = 1#
n_splits: int = 0#
expand_limits: bool | Literal['auto'] | str = 'auto'#
ignore_filters: bool = False#
filter: Callable[[Timestamp, Timestamp, Timestamp], bool] | str | None = None#
get_splits(available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) list[DatetimeSplitBounds][source]#

Compute a split of given user data.

get_plot_data(available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) tuple[list[DatetimeSplitBounds], MaterializedSchedule][source]#

Returns additional data needed to visualize folds.

as_dict() DatetimeIndexSplitterKwargs[source]#

Returns the splitter as a dict.

default_metrics_formatter(end_message: str, metrics: dict[Any, Any] | Series | DataFrame | str | Any) str[source]#

Default formatting implementation.

Format using an appropriate pandas to_string()-method if metrics is a dict or a pandas type. Nested dictionaries are flattened using flatten_dict() if metrics is a dict-of-dicts.

Metrics of type str are assumed to be preformatted, and are appended to end_message as-is.

If any other types are given, fall back to f"{end_message} Metrics: {metrics}".

Examples

Formatting a nested dict.

>>> metrics = {"rmse": {"train": 0.11, "test": 0.5, "future": 20.19}}
>>> print(default_metrics_formatter("End message.", metrics))
End message. Fold metrics:
      train  test  future
rmse   0.11   0.5   20.19

Formatting a pandas.DataFrame.

>>> metrics = {"me": [0.1, 0.2, 0.3], "rmse": [0.11, 0.5, 20.19]}
>>> df = pd.DataFrame(metrics, index=["train", "test", "future"])
>>> print(default_metrics_formatter("End message.", df))
End message. Fold metrics:
         me   rmse
train   0.1   0.11
test    0.2   0.50
future  0.3  20.19

The index printed unless` it is a pandas.RangeIndex.

expand_limits(limits: tuple[Timestamp, Timestamp], spec: bool | Literal['auto'] | str | tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64] | Iterable[tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64]] = 'auto') tuple[Timestamp, Timestamp][source]#

Derive the “real” bounds of limits.

Use time_split.settings.misc.round_limits to allow inward “expansion”.

Parameters:
  • limits – A tuple (min, max) of timestamps.

  • spec – Expansion spec as described in the User guide. Also supports level-tuples [(start_at, round_to, tolerance)...]. Passing expand_limits=[settings.auto_expand_limits.day, settings.auto_expand_limits.hour] is equivalent to expand_limits='auto'.

Returns:

Limits rounded according to the given specification.

Raises:

ValueError – For invalid limits.

Examples

>>> from pandas import Timestamp
>>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 22:05:30")

Basic usage.

>>> expand_limits(limits, "d")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-12 00:00:00'))

You may specify a maximum “distance” that limits may be expanded.

>>> expand_limits(limits, "d<1h")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 22:05:30'))

Limits will never be rounded in the “wrong” direction…

>>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 11:05:30")
>>> expand_limits(limits, "d")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))

…even if you make the tolerance large enough.

>>> expand_limits(limits, "d<14h")
(Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))

To disable this restriction, set round_limits=True in the settings.

fold_weight(splits: list[DatetimeSplitBounds], *, unit: str | Literal['rows', 'hours', 'days'] = 'hours', available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) list[DatetimeSplitCounts][source]#

Compute fold weights.

Parameters:
  • splits – List of DatetimeSplitBounds.

  • unit – Time unit of the returned count, or ‘rows’ (requires available data).

  • available – Available data. Required when unit='rows'.

Returns:

A list of tuples [(n_data_units, n_future_data_units), ...].

Raises:

ValueError – if unit='rows' and available=None.

format_expanded_limits(original: tuple[Timestamp, Timestamp] | Iterable[str | Timestamp | datetime | date | datetime64], *, expanded: tuple[Timestamp, Timestamp] | None = None, expand_limits: bool | Literal['auto'] | str = 'auto', raise_if_same: bool = False) str[source]#

Format expanded limits.

Parameters:
  • original – The original data limits.

  • expanded – Expanded data limits. Derived based on original and expanded_limits if None.

  • expand_limits – Limits expansion spec.

  • raise_if_same – If True, raise a ValueError if the original and expanded_limits are not the same. Otherwise, a different message will be returned instead.

Returns:

A string.

Raises:

ValueError – If raise_on_same is True and original == expanded.

Examples

Basic usage.

>>> limits = "2019-05-11", "2019-05-11 22:05:30"
>>> string = format_expanded_limits(limits, expand_limits="d<3h")
>>> print(string)
Available data limits have been expanded (since expand_limits='d<3h'):
  start: 2019-05-11 00:00:00 -> <no change>
    end: 2019-05-11 22:05:30 -> 2019-05-12 (+1h 54m 30s)

A different message is shown when the limits aren’t expanded.

>>> string = format_expanded_limits(limits, expand_limits="d<1h")
>>> print(string)
Original limits ('2019-05-11', '2019-05-11 22:05:30') were not expanded (since expand_limits='d<1h').

Set raise_if_same=True to disable the second message.

See also

The process_available() and expand_limits() functions.

process_available(available: Iterable[str | Timestamp | datetime | date | datetime64], *, expand_limits: bool | Literal['auto'] | str) ProcessAvailableResult[source]#

Process a user-given available argument.

Parameters:
  • available – Available data from user. May be None

  • expand_limits – Expansion spec as described in the User guide. Determines how much (if at all) to expand limits.

Returns:

A tuple (available, limits). Note that available will be None, it has not been iterated over. This assures that iterables are not consumed unless needed.

Raises:

ValueError – For invalid available arguments.

to_string(start_or_bounds: str | Timestamp | datetime | date | datetime64 | DatetimeSplitBounds | tuple[str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64], mid: str | Timestamp | datetime | date | datetime64 | None = None, end: str | Timestamp | datetime | date | datetime64 | None = None, /, *, format: str | None = None) str[source]#

Pretty-print a fold.

Parameters:
  • start_or_bounds – A fold tuple (start, mid, end), or just start (followed by mid and end).

  • mid – Datetime-like. Must be None when start_or_bounds is a tuple.

  • end – Datetime-like. Must be None when start_or_bounds is a tuple.

  • format – A custom format to use. Use FOLD_FORMAT if None, but note that only the start, mid and end keys are available to this function.

Returns:

Formatted bounds string.

Raises:

TypeError – If an incorrect number of timestamps are given.

Examples

Sample format output.

>>> to_string("2021-12-30", "2022-01-04", "2022-01-04 18:00:00")
"'2021-12-30' <= [schedule: '2022-01-04' (Tuesday)] < '2022-01-04 18:00:00'"

The default format was used above.

Using properties. The delta is the distance from mid formatted by format_seconds(). The date property returns a datetime.date object.

>>> to_string(
...     ("2021-12-30", "2022-01-04", "2022-01-04 18:00:00"),
...     format="'{start.date}' [-{start.delta}] <= '{mid.iso}' < [+{end.delta}]",
... )
"'2021-12-30' [-5d] <= '2022-01-04T00:00:00' < [+18h]"

The delta is always positive; you must add the sign yourself.

Modules

types

Internal types.