time_split.integration.sklearn#

Integration with the scikit-learn library.

Classes

ScikitLearnSplitter(*[, log_progress, verify_xy])

A scikit-learn compatible datetime splitter.

class ScikitLearnSplitter(*, log_progress: str | bool | Logger | LoggerAdapter[Any] | LogSplitProgressKwargs[MetricsType] = False, verify_xy: bool = True, **kwargs: Unpack[DatetimeIndexSplitterKwargs])[source]#

Bases: object

A scikit-learn compatible datetime splitter.

This class may be used to create temporal folds from heterogeneous/unaggregated data, typically used for training models (e.g. on raw transaction data). If your data is a well-formed time series, consider using the TimeSeriesSplit class from scikit-learn instead.

If a pandas type is passed to the ScikitLearnSplitter.split()-method, the index will be used.

Parameters:

log_progress – Controls logging of fold progress. See log_split_progress() for details.
verify_xy – If True, split X and y independently and verify that they are equal.
**kwargs – See split(). The available keyword is managed by the integration.

For more information about the schedule, before/after and expand_limits-arguments, see the User guide.

Returns the number of splitting iterations in the cross-validator.

Equivalent to len(list(split(X, y, groups)).

Parameters:

X – Training data (features).
y – Target variable.
groups – Always ignored, exists for compatibility.

Returns:

Number of splits with given arguments.

Raises:

ValueError – If both X and y are None.
ValueError – If splits of X and y are not equal when verify_xy=True.

Generate indices to split data into training and test set.

Parameters:

X – Training data (features).
y – Target variable.
groups – Always ignored, exists for compatibility.

Yields:

The training/test set indices for that split.

Raises:

ValueError – If both X and y are None.
ValueError – If splits of X and y are not equal when verify_xy=True.
TypeError – If X or y have an index-attribute, but index elements are not datetime-like.