time_split.support#
Supporting functions.
These functions are used internally, but are exposed here as well so that user may create their own logic using the internal logic, or just to test things out.
Warning
Not part of the stable API.
This module may change without notice. Stick to the top-level time_split-module, or lock down your
dependencies if you need to use the support module.
Functions
|
Create a Time Fold Explorer application URL. |
|
Default formatting implementation. |
|
Derive the "real" bounds of limits. |
|
Compute fold weights. |
|
Format expanded limits. |
|
Process a user-given available argument. |
Pretty-print a fold. |
Classes
|
Backend interface for splitting user data. |
- class DatetimeIndexSplitter(schedule: DatetimeIndex | Iterable[str | Timestamp | datetime | date | datetime64] | str | Timedelta | timedelta | timedelta64, before: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = '7d', after: int | Literal['all'] | str | Timedelta | timedelta | timedelta64 = 1, step: int = 1, n_splits: int = 0, expand_limits: bool | Literal['auto'] | str = 'auto', ignore_filters: bool = False)[source]#
Bases:
objectBackend interface for splitting user data. See the Parameter overview page.
- schedule: DatetimeIndex | Iterable[str | Timestamp | datetime | date | datetime64] | str | Timedelta | timedelta | timedelta64#
- get_splits(available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) list[DatetimeSplitBounds][source]#
Compute a split of given user data.
- get_plot_data(available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) tuple[list[DatetimeSplitBounds], MaterializedSchedule][source]#
Returns additional data needed to visualize folds.
- as_dict() DatetimeIndexSplitterKwargs[source]#
Returns the splitter as a
dict.
- expand_limits(limits: tuple[Timestamp, Timestamp], spec: bool | Literal['auto'] | str | tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64] | Iterable[tuple[str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64, str | Timedelta | timedelta | timedelta64]] = 'auto') tuple[Timestamp, Timestamp][source]#
Derive the “real” bounds of limits.
- Parameters:
limits – A tuple
(min, max)of timestamps.spec – Expansion spec as described in the User guide. Also supports level-tuples
[(start_at, round_to, tolerance)...]. Passingexpand_limits=[settings.auto_expand_limits.day, settings.auto_expand_limits.hour]is equivalent toexpand_limits='auto'.
- Returns:
Limits rounded according to the given specification.
- Raises:
ValueError – For invalid limits.
Examples
>>> from pandas import Timestamp >>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 22:05:30")
Basic usage.
>>> expand_limits(limits, "d") (Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-12 00:00:00'))
You may specify a maximum “distance” that limits may be expanded.
>>> expand_limits(limits, "d<1h") (Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 22:05:30'))
Limits will never be rounded in the “wrong” direction…
>>> limits = Timestamp("2019-05-11"), Timestamp("2019-05-11 11:05:30") >>> expand_limits(limits, "d") (Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))
…even if you make the tolerance large enough.
>>> expand_limits(limits, "d<14h") (Timestamp('2019-05-11 00:00:00'), Timestamp('2019-05-11 11:05:30'))
- process_available(available: Iterable[str | Timestamp | datetime | date | datetime64], *, expand_limits: bool | Literal['auto'] | str) ProcessAvailableResult[source]#
Process a user-given available argument.
- Parameters:
available – Available data from user. May be
Noneexpand_limits – Expansion spec as described in the User guide. Determines how much (if at all) to expand limits.
- Returns:
A tuple
(available, limits). Note that available will beNone, it has not been iterated over. This assures that iterables are not consumed unless needed.- Raises:
ValueError – For invalid available arguments.
- default_metrics_formatter(end_message: str, metrics: dict[Any, Any] | Series | DataFrame | str | Any) str[source]#
Default formatting implementation.
Format using an appropriate pandas
to_string()-method if metrics is adictor a pandas type. Nested dictionaries are flattened usingflatten_dict()if metrics is a dict-of-dicts.Metrics of type
strare assumed to be preformatted, and are appended to end_message as-is.If any other types are given, fall back to
f"{end_message} Metrics: {metrics}".Examples
Formatting a nested dict.
>>> metrics = {"rmse": {"train": 0.11, "test": 0.5, "future": 20.19}}
>>> print(default_metrics_formatter("End message.", metrics)) End message. Fold metrics: train test future rmse 0.11 0.5 20.19
Formatting a
pandas.DataFrame.>>> metrics = {"me": [0.1, 0.2, 0.3], "rmse": [0.11, 0.5, 20.19]} >>> df = pd.DataFrame(metrics, index=["train", "test", "future"]) >>> print(default_metrics_formatter("End message.", df)) End message. Fold metrics: me rmse train 0.1 0.11 test 0.2 0.50 future 0.3 20.19
The index printed unless` it is a
pandas.RangeIndex.
- fold_weight(splits: list[DatetimeSplitBounds], *, unit: str | Literal['rows', 'hours', 'days'] = 'hours', available: Iterable[str | Timestamp | datetime | date | datetime64] | None = None) list[DatetimeSplitCounts][source]#
Compute fold weights.
- Parameters:
splits – List of
DatetimeSplitBounds.unit – Time unit of the returned count, or ‘rows’ (requires available data).
available – Available data. Required when
unit='rows'.
- Returns:
A list of tuples
[(n_data_units, n_future_data_units), ...].- Raises:
ValueError – if
unit='rows'andavailable=None.
- to_string(bounds: str | Timestamp | datetime | date | datetime64 | DatetimeSplitBounds | tuple[str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64, str | Timestamp | datetime | date | datetime64], mid: str | Timestamp | datetime | date | datetime64 | None = None, end: str | Timestamp | datetime | date | datetime64 | None = None, /, *, format: str | None = None) str[source]#
Pretty-print a fold.
Sample output.#('2021-12-30' <= [schedule: '2022-01-04' (Tuesday)] < '2022-01-04 18:00:00')
- Parameters:
bounds – A fold tuple
(start, mid, end), or just start (followed by mid and end).mid – Datetime-like. Must be
Nonewhen bounds is a tuple.end – Datetime-like. Must be
Nonewhen bounds is a tuple.format – A custom format to use. Use
FOLD_FORMATifNone, but note that only the start, mid and end keys are available to this function.
- Returns:
Formatted bounds string.
- Raises:
TypeError – If an incorrect number of timestamps are given.
- format_expanded_limits(original: tuple[Timestamp, Timestamp], *, expanded: tuple[Timestamp, Timestamp] | None = None, expand_limits: bool | Literal['auto'] | str) str[source]#
Format expanded limits.
- Parameters:
original – The original data limits.
expanded – Expanded data limits. Derived based on original and expanded_limits if
None.expand_limits – Limits expansion spec.
- Returns:
A string.
- create_explorer_link(host: str, data: Iterable[str | Timestamp | datetime | date | datetime64] | str | None = None, available: Iterable[str | Timestamp | datetime | date | datetime64] | str | None = None, *, show_removed: bool = True, skip_default: bool = False, **kwargs: Unpack[DatetimeIndexSplitterKwargs]) str[source]#
Create a Time Fold Explorer application URL.
- Parameters:
host – Base address where the application is hosted.
data – Binds schedule to a range. Regular available arguments (as passed to e.g.
time_split.split()) are encoded as a date range to generate dummy data for. Pass astrto use dataset bundled by the server instead. Note that this function cannot verify the kwargs if available is astrdataset.available – Alias of data.
skip_default – If
True, do not include default split params in the link.show_removed – If
True, splits removed by n_splits or step are included in the figure.kwargs – Keyword arguments for the
time_split.split()-function.
- Returns:
An encoded URL.
Examples
Getting the URL for a local host.
>>> create_explorer_link( ... host="http://localhost:8501", ... available=("2019-04-11 00:35:00", "2019-05-11 21:30:00"), ... schedule="0 0 * * MON,FRI", ... ) 'http://localhost:8501?data=1554942900-1557610200&schedule=0+0+%2A+%2A+MON%2CFRI&show_removed=True'
To start the application using locally using
Docker, run
docker run -p 8501:8501 rsundqvist/time-split
in the terminal.
Modules
Internal types. |