time_split.support#
Supporting functions.
These functions are used internally, but are exposed here as well so that user may create their own logic using the internal logic, or just to test things out.
Warning
Not part of the stable API.
This module may change without notice. Stick to the top-level time_split-module, or lock down your
dependencies if you need to use the support module.
Functions
|
Default formatting implementation. |
|
Derive the "real" bounds of limits. |
|
Compute fold weights. |
|
Format expanded limits. |
|
Process a user-given available argument. |
Pretty-print a fold. |
Classes
|
Backend interface for splitting user data. |
- class DatetimeIndexSplitter(schedule: DatetimeIndex | Iterable[str | Timestamp | datetime | date | datetime64] | str | Timedelta | timedelta | timedelta64, before: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = '7d', after: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = 1, step: int = 1, n_splits: int = 0, expand_limits: bool | Literal['auto'] | str = 'auto', ignore_filters: bool = False)[source]#
Bases:
objectBackend interface for splitting user data. See the Parameter overview page.
- schedule: DatetimeIndex | Iterable[str | Timestamp | datetime | date | datetime64] | str | Timedelta | timedelta | timedelta64#
- get_splits(available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) list[DatetimeSplitBounds][source]#
Compute a split of given user data.
- get_plot_data(available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) tuple[list[DatetimeSplitBounds], MaterializedSchedule][source]#
Returns additional data needed to visualize folds.
- as_dict() DatetimeIndexSplitterKwargs[source]#
Returns the splitter as a
dict.
- default_metrics_formatter(end_message: str, metrics: dict[Any, Any] | Series | DataFrame | str | Any) str[source]#
Default formatting implementation.
Format using an appropriate pandas
to_string()-method if metrics is adictor a pandas type. Nested dictionaries are flattened usingflatten_dict()if metrics is a dict-of-dicts.Metrics of type
strare assumed to be preformatted, and are appended to end_message as-is.If any other types are given, fall back to
f"{end_message} Metrics: {metrics}".Examples
Formatting a nested dict.
>>> metrics = {"rmse": {"train": 0.11, "test": 0.5, "future": 20.19}}
>>> print(default_metrics_formatter("End message.", metrics)) End message. Fold metrics: train test future rmse 0.11 0.5 20.19
Formatting a
pandas.DataFrame.>>> metrics = {"me": [0.1, 0.2, 0.3], "rmse": [0.11, 0.5, 20.19]} >>> df = pd.DataFrame(metrics, index=["train", "test", "future"]) >>> print(default_metrics_formatter("End message.", df)) End message. Fold metrics: me rmse train 0.1 0.11 test 0.2 0.50 future 0.3 20.19
The index printed unless` it is a
pandas.RangeIndex.
- expand_limits(limits: tuple[Timestamp, Timestamp], spec: bool | Literal['auto'] | str | tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64] | Iterable[tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64]] = 'auto') tuple[Timestamp, Timestamp][source]#
Derive the “real” bounds of limits.
Use
time_split.settings.misc.round_limitsto allow inward “expansion”.- Parameters:
limits – A tuple
(min, max)of timestamps.spec – Expansion spec as described in the User guide. Also supports level-tuples
[(start_at, round_to, tolerance)...]. Passingexpand_limits=[settings.auto_expand_limits.day, settings.auto_expand_limits.hour]is equivalent toexpand_limits='auto'.
- Returns:
Limits rounded according to the given specification.
- Raises:
ValueError – For invalid limits.
Examples
>>> from pandas import Timestamp >>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 22:05:30")
Basic usage.
>>> expand_limits(limits, "d") (Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-12 00:00:00'))
You may specify a maximum “distance” that limits may be expanded.
>>> expand_limits(limits, "d<1h") (Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 22:05:30'))
Limits will never be rounded in the “wrong” direction…
>>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 11:05:30") >>> expand_limits(limits, "d") (Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))
…even if you make the tolerance large enough.
>>> expand_limits(limits, "d<14h") (Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))
To disable this restriction, set
round_limits=Truein the settings.
- fold_weight(splits: list[DatetimeSplitBounds], *, unit: str | Literal['rows', 'hours', 'days'] = 'hours', available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) list[DatetimeSplitCounts][source]#
Compute fold weights.
- Parameters:
splits – List of
DatetimeSplitBounds.unit – Time unit of the returned count, or ‘rows’ (requires available data).
available – Available data. Required when
unit='rows'.
- Returns:
A list of tuples
[(n_data_units, n_future_data_units), ...].- Raises:
ValueError – if
unit='rows'andavailable=None.
- format_expanded_limits(original: tuple[Timestamp, Timestamp], *, expanded: tuple[Timestamp, Timestamp] | None = None, expand_limits: bool | Literal['auto'] | str) str[source]#
Format expanded limits.
- Parameters:
original – The original data limits.
expanded – Expanded data limits. Derived based on original and expanded_limits if
None.expand_limits – Limits expansion spec.
- Returns:
A string.
- process_available(available: Iterable[str | Timestamp | datetime | date | datetime64], *, expand_limits: bool | Literal['auto'] | str) ProcessAvailableResult[source]#
Process a user-given available argument.
- Parameters:
available – Available data from user. May be
Noneexpand_limits – Expansion spec as described in the User guide. Determines how much (if at all) to expand limits.
- Returns:
A tuple
(available, limits). Note that available will beNone, it has not been iterated over. This assures that iterables are not consumed unless needed.- Raises:
ValueError – For invalid available arguments.
- to_string(start_or_bounds: str | Timestamp | datetime | date | datetime64 | DatetimeSplitBounds | tuple[str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64], mid: str | Timestamp | datetime | date | datetime64 | None = None, end: str | Timestamp | datetime | date | datetime64 | None = None, /, *, format: str | None = None) str[source]#
Pretty-print a fold.
- Parameters:
start_or_bounds – A fold tuple
(start, mid, end), or just start (followed by mid and end).mid – Datetime-like. Must be
Nonewhen start_or_bounds is a tuple.end – Datetime-like. Must be
Nonewhen start_or_bounds is a tuple.format – A custom format to use. Use
FOLD_FORMATifNone, but note that only the start, mid and end keys are available to this function.
- Returns:
Formatted bounds string.
- Raises:
TypeError – If an incorrect number of timestamps are given.
Examples
Sample format output.
>>> to_string("2021-12-30", "2022-01-04", "2022-01-04 18:00:00") "'2021-12-30' <= [schedule: '2022-01-04' (Tuesday)] < '2022-01-04 18:00:00'"
The
default formatwas used above.Using properties. The delta is the distance from mid formatted by
format_seconds(). The date property returns adatetime.dateobject.>>> to_string( ... ("2021-12-30", "2022-01-04", "2022-01-04 18:00:00"), ... format="'{start.date}' [-{start.delta}] <= '{mid.iso}' < [+{end.delta}]", ... ) "'2021-12-30' [-5d] <= '2022-01-04T00:00:00' < [+18h]"
The delta is always positive; you must add the sign yourself.
Modules
Internal types. |