Plotting metrics per fold and data set.#

Using an unbounded timedelta-schedule, with custom bar labels.

import pandas
from numpy.random import default_rng
from rics import configure_stuff
from time_split import log_split_progress, plot, split

configure_stuff(datefmt="")

data = pandas.date_range("2022", "2022-2", freq="38min").to_series()
config = dict(schedule="7d", before="14d", after=1, available=data)
👻 Configured some stuff just the way I like it!

Unbounded (timedelta-string or CRON) schedules require available data to materialize the schedule. When using the plot-function, this data is also used to create bar labels unless they’re explicitly given. We would like to plot metrics instead of just dataset sizes. Let’s create some dummy metrics.

metrics = {}
random = default_rng(2019_05_11).random

Adding a get_metrics callback to time_split.log_split_progress() will add formatted metric output to the fold-end message emitted at the end of each iteration.

for fold in log_split_progress(
    split(**config),
    get_metrics=lambda k: metrics[k.date()],
):
    metrics[fold.mid.date()] = {
        "before": {"rmse": 2 * random(), "mae": random(), "r2": -random()},
        "after": {"rmse": 3 * random(), "mae": 1.5 * random()},
    }
[time_split:INFO] Begin fold 1/2: '2022-01-01' <= [schedule: '2022-01-15' (Saturday)] < '2022-01-22'.
[time_split:INFO] Finished fold 1/2: [schedule: '2022-01-15' (Saturday)] after 29 μs. Fold metrics:
        rmse   mae     r2
before  1.85 0.518 -0.974
after  2.991 0.466    NaN
[time_split:INFO] Begin fold 2/2: '2022-01-08' <= [schedule: '2022-01-22' (Saturday)] < '2022-01-29'.
[time_split:INFO] Finished fold 2/2: [schedule: '2022-01-22' (Saturday)] after 13 μs. Fold metrics:
        rmse   mae     r2
before 1.266 0.921 -0.135
after  0.009 0.301    NaN

The bar_labels-arguments expects a list of tuples on the form [("left-label", "right-label")], plotting string tuples in the same order in which they were originally returned by the split()-method.

bar_labels = [
    (
        (
            f"Training metrics ({date}):\n"  # Header
            + pandas.Series(fold_metrics["before"]).to_string(float_format="%.2f")
        ),
        pandas.Series(fold_metrics["after"]).to_string(float_format="%.2f"),
    )
    for date, fold_metrics in metrics.items()
]
ax = plot(**config, bar_labels=bar_labels)
ax.figure.set_size_inches(20, 6)
time_split.split(schedule='7d', before='14d', available=pd.Series)

Bar height is not based on bar_labels, so make sure to configure e.g. rcParams["figure.figsize"] beforehand when the bar_labels text is large. Alternatively, you may pass a pre-initialized matplotlib.axes.Axes-instance using the ax keyword-argument.

Total running time of the script: (0 minutes 0.167 seconds)

Gallery generated by Sphinx-Gallery