Backtesting with ML models

Let’s train regression models with different regression models.

Loading the data

Utilies and data libraries

import pandas as pd
import datetime
import json

Load the metadata

raw_json = open("./settings/settings.json")
meta_json = json.load(raw_json)
backtesting_path = meta_json["data"]["backtesting_path"]

Load the dataset and reformat it

ts = pd.read_csv("data/data.csv")
ts["ds"] = pd.to_datetime(ts["ds"])
ts = ts.sort_values("ds")
ts = ts[["unique_id", "ds", "y"]]
ts.head()
unique_id ds y
0 1 2022-11-11 23:00:00 456403
1 1 2022-11-12 00:00:00 458842
2 1 2022-11-12 01:00:00 455111
3 1 2022-11-12 02:00:00 448035
4 1 2022-11-12 03:00:00 438165

Set the data

# os.environ["NIXTLA_ID_AS_COL"] = "1"
from utilsforecast.plotting import plot_series
plot_series(ts, engine = "plotly").update_layout(height=300)

Set the Backtesting Process

In the following example, we will demonstrate how to set a simple backtesting process that train multiple machine learning models.

Let’s start by defining the models:

  • Regression based on k-nearest neighbors (see model documentation)
  • Multi-layer Perceptron regressor (see model documentation)
  • ElasticNet - Linear regression with combined L1 and L2 priors as regularizer (see model documentation)

Note that we will set an initial higher max iteration for the Multi-layer Perceptron and ElasticNet models using the max_iter argument:

from sklearn.linear_model import  ElasticNet
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor



ml_models = {
    "knn": KNeighborsRegressor(),
    "mlp": MLPRegressor(max_iter=1000),
    "enet": ElasticNet(max_iter=1000)

}

We will use the following features:

  • Lags - lag 1 and 24
  • Seasonal features - monthly, daily, day of the week, weekly
lags = [1, 24]
date_features = ["month", "day", "dayofweek", "week", "hour"]

Let’s define the forecast object:

from mlforecast import MLForecast
from mlforecast.utils import PredictionIntervals

mlf = MLForecast(
    models=ml_models,
    freq="h",
    lags = lags,
    date_features = date_features
)

Next, we will define the backtesting parameters. We will use a backtesting with four testing partitions, each testing partition with length of 72 hours, and overlapping of 12 hours between each partition. In adddion we will set a 95% prediction intervals using conformal distribution method:

h = 72
step_size = 12
partitions = 4
n_windows = 3
method = "conformal_distribution"
pi = PredictionIntervals(h=h, n_windows = n_windows, method = method)
levels = [95]

Let’s run the backtesting using the cross_validation method:

bkt_df = mlf.cross_validation(
        df = ts,
        step_size= step_size,
        n_windows=partitions,
        prediction_intervals=PredictionIntervals(n_windows=2, h=h),
        level= levels,
        h=h,
        fitted=True,)
/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.475e+11, tolerance: 9.972e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.601e+11, tolerance: 9.997e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.552e+11, tolerance: 9.973e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.639e+11, tolerance: 9.998e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.564e+11, tolerance: 9.976e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.636e+11, tolerance: 1.000e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.561e+11, tolerance: 9.976e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.512e+11, tolerance: 1.000e+10
bkt_df.head()
unique_id ds cutoff y knn mlp enet knn-lo-95 knn-hi-95 mlp-lo-95 mlp-hi-95 enet-lo-95 enet-hi-95
0 1 2024-11-26 12:00:00 2024-11-26 11:00:00 423022 416257.1875 409043.81250 413452.84375 408580.041406 423934.333594 396099.003906 421988.621094 402097.917187 424807.770313
1 1 2024-11-26 13:00:00 2024-11-26 11:00:00 446189 420906.1875 420555.43750 428518.75000 400963.112500 440849.262500 398524.904687 442585.970313 409926.205469 447111.294531
2 1 2024-11-26 14:00:00 2024-11-26 11:00:00 458839 427260.0000 430777.03125 441908.21875 408422.626563 446097.373437 408585.832813 452968.229687 424869.825781 458946.611719
3 1 2024-11-26 15:00:00 2024-11-26 11:00:00 464379 443909.8125 438137.00000 452244.65625 425663.901562 462155.723438 416472.705469 459801.294531 436616.064063 467873.248437
4 1 2024-11-26 16:00:00 2024-11-26 11:00:00 465544 453961.8125 443568.50000 460505.53125 430425.642188 477497.982812 414669.038281 472467.961719 437231.853125 483779.209375

For convenience reasons, we will map to the partition label their numeric order (as opposed to the timestamp):

::: {#cell-Partition mapping .cell execution_count=12}

cutoff = bkt_df["cutoff"].unique()
partitions_mapping = pd.DataFrame({"cutoff": cutoff, 
    "partition": range(1, len(cutoff) + 1)})

partitions_mapping.head()
cutoff partition
0 2024-11-26 11:00:00 1
1 2024-11-26 23:00:00 2
2 2024-11-27 11:00:00 3
3 2024-11-27 23:00:00 4

:::

Let’s merge it to the backtesting table:

bkt_df = bkt_df.merge(partitions_mapping, how = "left", on = ["cutoff"])

Last but not least, let’s plot the results:

from plotly.subplots import make_subplots
import plotly.graph_objects as go

partitions_labels =  bkt_df["partition"].unique()

ts_sub = ts[ts["ds"] > ts["ds"].max() -  datetime.timedelta(hours = 24 * 7)]
fig = make_subplots(rows=partitions, cols=1, subplot_titles= ["Partitions: " + str(i) for i in partitions_labels])


r = 1

for i in partitions_labels:
    if r == 1:
        showlegend = True
    else:
        showlegend = False
    bkt_sub = bkt_df[bkt_df["partition"] == i]
    fig.append_trace(go.Scatter(x= ts_sub["ds"], y=ts_sub["y"], legendgroup = "actual", showlegend = showlegend, mode='lines', name='Actual', line=dict(color='#023047', width=2)), row = r, col = 1)
    fig.append_trace(go.Scatter(x=bkt_sub["ds"], y= bkt_sub["knn"], mode='lines', name='k-nearest neighbors', legendgroup = "knn", showlegend = showlegend, line=dict(color='#2a9d8f', width=1.5, dash = "dash")), row = r, col = 1)
    fig.append_trace(go.Scatter(x=bkt_sub["ds"], y= bkt_sub["mlp"], mode='lines', name='Multi-layer Perceptron',legendgroup = "mlp", showlegend = showlegend, line=dict(color='#0077b6', width=1.5, dash = "dot")), row = r, col = 1)
    fig.append_trace(go.Scatter(x=bkt_sub["ds"], y= bkt_sub["enet"], mode='lines', name='ElasticNet',legendgroup = "enet", showlegend = showlegend, line=dict(color='#ffc8dd', width=1.5, dash = "dot")), row = r, col = 1)
    r = r + 1


fig.update_layout(height=600)
fig.show()

Scoring the models

In this section, we will process the backtesting output and score the models. This includes the following steps: - Transition the backtesting dataframe from wide to long format - Calculate the models performance on each testing partition

We will use the following error metrics to evaluate the models performance: - MAPE - Mean Absolute Percentage Error - RMSE - Root Mean Square Error - Coverage - percentage of actual values that were within the prediction intervals range

Let’s reformat the data, transform the backtesting table - bkt_df from wide to long.

We will use the melt function transition the table into long format, where we assign the transform fields names into new column named model_label and the corresponding values into the value column:

models = list(ml_models.keys()) 
bkt_long = pd.melt(
    bkt_df,
    id_vars=["unique_id", "ds", "partition", "y"],
    value_vars=models + [f"{model}-lo-95" for model in models] \
                      + [f"{model}-hi-95" for model in models],
    var_name="model_label",
    value_name="value",
)


bkt_long.head()
unique_id ds partition y model_label value
0 1 2024-11-26 12:00:00 1 423022 knn 416257.1875
1 1 2024-11-26 13:00:00 1 446189 knn 420906.1875
2 1 2024-11-26 14:00:00 1 458839 knn 427260.0000
3 1 2024-11-26 15:00:00 1 464379 knn 443909.8125
4 1 2024-11-26 16:00:00 1 465544 knn 453961.8125

We will use the following function to relabel the forecast and prediction intervals values into forecast, lower and upper:

::: {#cell-Relabel the PI field .cell execution_count=16}

def split_model_confidence(model_name):
    if "-lo-95" in model_name:
        return model_name.replace("-lo-95", ""), "lower"
    elif "-hi-95" in model_name:
        return model_name.replace("-hi-95", ""), "upper"
    else:
        return model_name, "forecast"

bkt_long["model_label"],\
bkt_long["type"] = zip(*bkt_long["model_label"].map(split_model_confidence))
bkt_long.head()
unique_id ds partition y model_label value type
0 1 2024-11-26 12:00:00 1 423022 knn 416257.1875 forecast
1 1 2024-11-26 13:00:00 1 446189 knn 420906.1875 forecast
2 1 2024-11-26 14:00:00 1 458839 knn 427260.0000 forecast
3 1 2024-11-26 15:00:00 1 464379 knn 443909.8125 forecast
4 1 2024-11-26 16:00:00 1 465544 knn 453961.8125 forecast

:::

In addition, we will map the model functions name to the labels we created and merge it later with the backtesting table:

model_label = list(ml_models.keys()) 
model_name = [type(s).__name__ for s in list(ml_models.values())] 

models_mapping = pd.DataFrame({"model_label": model_label, "model_name": model_name})

models_mapping
model_label model_name
0 knn KNeighborsRegressor
1 mlp MLPRegressor
2 enet ElasticNet

Next, let’s use the pivot function to pivot the type filed into three new fields and merge it with the partitions mapping table:

bkt = (bkt_long
.pivot(index = ["unique_id", "ds", "model_label","partition", "y"], columns = "type", values = "value")
.reset_index()
.merge(models_mapping, how = "left", on = ["model_label"])
)
bkt.head()
unique_id ds model_label partition y forecast lower upper model_name
0 1 2024-11-26 12:00:00 enet 1 423022 413452.84375 402097.917187 424807.770313 ElasticNet
1 1 2024-11-26 12:00:00 knn 1 423022 416257.18750 408580.041406 423934.333594 KNeighborsRegressor
2 1 2024-11-26 12:00:00 mlp 1 423022 409043.81250 396099.003906 421988.621094 MLPRegressor
3 1 2024-11-26 13:00:00 enet 1 446189 428518.75000 409926.205469 447111.294531 ElasticNet
4 1 2024-11-26 13:00:00 knn 1 446189 420906.18750 400963.112500 440849.262500 KNeighborsRegressor

Now we can score the models results using the following helpers functions:

from statistics import mean
def mape(y, yhat):
    mape = mean(abs(y - yhat)/ y) 
    return mape

def rmse(y, yhat):
    rmse = (mean((y - yhat) ** 2 )) ** 0.5
    return rmse

def coverage(y, lower, upper):
    coverage = sum((y <= upper) & (y >= lower)) / len(y)
    return coverage
def score(df):
    mape_score = mape(y = df["y"], yhat = df["forecast"])
    rmse_score = rmse(y = df["y"], yhat = df["forecast"])
    coverage_score = coverage(y = df["y"], lower = df["lower"], upper = df["upper"])
    cols = ["mape","rmse", "coverage"]
    d = pd.Series([mape_score, rmse_score,  coverage_score], index=cols)

    return d

We will group by backtesting table by the series unique id, model label and partition and calculate its score:

score_df = (bkt
.groupby(["unique_id", "model_label", "model_name", "partition"])[["unique_id", "model_label", "model_name", "partition", "y", "forecast", "lower", "upper"]]
.apply(score)
.reset_index()
)

score_df
unique_id model_label model_name partition mape rmse coverage
0 1 enet ElasticNet 1 0.046387 26248.918613 0.791667
1 1 enet ElasticNet 2 0.040316 22284.199211 0.722222
2 1 enet ElasticNet 3 0.048238 25473.070455 0.541667
3 1 enet ElasticNet 4 0.056620 29758.941497 0.597222
4 1 knn KNeighborsRegressor 1 0.071400 38516.048318 0.708333
5 1 knn KNeighborsRegressor 2 0.065884 36684.848083 0.597222
6 1 knn KNeighborsRegressor 3 0.039062 20619.197374 0.861111
7 1 knn KNeighborsRegressor 4 0.046386 24855.016077 0.763889
8 1 mlp MLPRegressor 1 0.051659 26748.692236 0.875000
9 1 mlp MLPRegressor 2 0.049206 25506.360560 0.847222
10 1 mlp MLPRegressor 3 0.042411 22529.148592 0.763889
11 1 mlp MLPRegressor 4 0.051685 27958.964509 0.722222

Creating a leaderboard table:

leaderboard = (bkt
.groupby(["unique_id", "model_label", "model_name"])[["unique_id", "model_label", "model_name", "partition", "y", "forecast", "lower", "upper"]]
.apply(score)
.reset_index()
.sort_values(by = "mape")
)

leaderboard
unique_id model_label model_name mape rmse coverage
0 1 enet ElasticNet 0.047890 26077.112476 0.663194
2 1 mlp MLPRegressor 0.048740 25764.962634 0.802083
1 1 knn KNeighborsRegressor 0.055683 31113.450393 0.732639

Experimentation

Let’s now generalized the previous steps and set up an experimentation. The mean goal is to identify which model perform best. This includes identify the tuning parameters and features yield the best performance.

Let’s take the three models we used before and try different tuning parameters: - For the Multi-layer Perceptron regressor we will test different hidden layer size setting - For the ElasticNet model we will test different l1 ratio which defines the ration between L1 and L2 penalty

ml_models = {
    "knn": KNeighborsRegressor(),
    "mlp1": MLPRegressor(max_iter=2000, hidden_layer_sizes = (100,)),
    "mlp2": MLPRegressor(max_iter=2000, hidden_layer_sizes = (50,)),
    "mlp3": MLPRegressor(max_iter=2000, hidden_layer_sizes = (200,)),
    "enet1": ElasticNet(max_iter=2000, l1_ratio = 0, tol=0.001),
    "enet2": ElasticNet(max_iter=2000, l1_ratio = 0.5, tol=0.001),
    "enet3": ElasticNet(max_iter=2000, l1_ratio = 1, tol=0.001),

}

We will use the same features settings as before:

lags = [1, 24]
date_features = ["month", "day", "dayofweek", "week", "hour"]

And the same backtesting settings:

h = 72
step_size = 12
partitions = 4
n_windows = 3
method = "conformal_distribution"
pi = PredictionIntervals(h=h, n_windows = n_windows, method = method)
levels = [95]

Let’s set the forecasting object:

mlf = MLForecast(
    models=ml_models,
    freq="h",
    lags = lags,
    date_features= date_features
)

And apply the backtesting:

bkt_df = mlf.cross_validation(
        df = ts,
        step_size= step_size,
        n_windows=partitions,
        prediction_intervals=PredictionIntervals(n_windows=2, h=h),
        level= levels,
        h=h,
        fitted=True,)
/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.717e+11, tolerance: 9.972e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.518e+11, tolerance: 9.972e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.344e+11, tolerance: 9.972e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.781e+11, tolerance: 9.997e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.616e+11, tolerance: 9.997e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.720e+11, tolerance: 9.973e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.560e+11, tolerance: 9.973e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.395e+11, tolerance: 9.973e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.785e+11, tolerance: 9.998e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.635e+11, tolerance: 9.998e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.458e+11, tolerance: 9.998e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.724e+11, tolerance: 9.976e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.571e+11, tolerance: 9.976e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.341e+11, tolerance: 9.976e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.787e+11, tolerance: 1.000e+11 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.613e+11, tolerance: 1.000e+11

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.426e+11, tolerance: 1.000e+11

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.743e+11, tolerance: 9.976e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.573e+11, tolerance: 9.976e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.337e+11, tolerance: 9.976e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.790e+11, tolerance: 1.000e+11 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.549e+11, tolerance: 1.000e+11

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.474e+11, tolerance: 1.000e+11

We will use the following function to transform the backtesting object from wide to long:

def bkt_wide_2_long(bkt, models):
    # Mapping the models labels
    model_labels = list(models.keys())
    model_name = [type(s).__name__ for s in models.values()] 
    models_mapping = pd.DataFrame({"model_label": model_labels, "model_name": model_name})
    # Mapping the partitions
    cutoff = bkt["cutoff"].unique()
    partitions_mapping = pd.DataFrame({"cutoff": cutoff, 
    "partition": range(1, len(cutoff) + 1)})
    bkt = bkt.merge(partitions_mapping, how = "left", on = ["cutoff"])
    # Melting the bkt object to long
    bkt_long = pd.melt(
        bkt,
        id_vars=["unique_id", "ds", "partition", "y"],
        value_vars=model_labels + [f"{model}-lo-95" for model in model_labels] \
                          + [f"{model}-hi-95" for model in model_labels],
        var_name="model_label",
        value_name="value",
    )

    bkt_long["model_label"],bkt_long["type"] = zip(*bkt_long["model_label"].map(split_model_confidence))
    
    bkt = (bkt_long
            .pivot(index = ["unique_id", "ds", "model_label","partition", "y"], columns = "type", values = "value")
            .reset_index()
            .merge(models_mapping, how = "left", on = ["model_label"])
            )

    return bkt
bkt = bkt_wide_2_long(bkt = bkt_df, models = ml_models)

bkt.head()
unique_id ds model_label partition y forecast lower upper model_name
0 1 2024-11-26 12:00:00 enet1 1 423022 412914.00000 401216.973438 424611.026562 ElasticNet
1 1 2024-11-26 12:00:00 enet2 1 423022 413452.84375 402097.917187 424807.770313 ElasticNet
2 1 2024-11-26 12:00:00 enet3 1 423022 414156.62500 403249.959375 425063.290625 ElasticNet
3 1 2024-11-26 12:00:00 knn 1 423022 416257.18750 408580.041406 423934.333594 KNeighborsRegressor
4 1 2024-11-26 12:00:00 mlp1 1 423022 409429.62500 396350.746875 422508.503125 MLPRegressor
score_df = (bkt
.groupby(["unique_id", "model_label", "model_name", "partition"])[["unique_id", "model_label", "model_name", "partition", "y", "forecast", "lower", "upper"]]
.apply(score)
.reset_index()
)

score_df
unique_id model_label model_name partition mape rmse coverage
0 1 enet1 ElasticNet 1 0.046022 25850.656158 0.805556
1 1 enet1 ElasticNet 2 0.040436 22229.766257 0.763889
2 1 enet1 ElasticNet 3 0.047834 25163.856352 0.569444
3 1 enet1 ElasticNet 4 0.056144 29338.262505 0.569444
4 1 enet2 ElasticNet 1 0.046387 26248.881182 0.791667
5 1 enet2 ElasticNet 2 0.040316 22284.082250 0.722222
6 1 enet2 ElasticNet 3 0.048238 25473.073322 0.541667
7 1 enet2 ElasticNet 4 0.056620 29758.980748 0.597222
8 1 enet3 ElasticNet 1 0.046769 26734.650723 0.750000
9 1 enet3 ElasticNet 2 0.040188 22320.745610 0.680556
10 1 enet3 ElasticNet 3 0.048707 25853.755042 0.555556
11 1 enet3 ElasticNet 4 0.057163 30287.218437 0.569444
12 1 knn KNeighborsRegressor 1 0.071400 38516.048318 0.708333
13 1 knn KNeighborsRegressor 2 0.065884 36684.848083 0.597222
14 1 knn KNeighborsRegressor 3 0.039062 20619.197374 0.861111
15 1 knn KNeighborsRegressor 4 0.046386 24855.016077 0.763889
16 1 mlp1 MLPRegressor 1 0.052514 27242.245521 0.861111
17 1 mlp1 MLPRegressor 2 0.048664 24604.639462 0.847222
18 1 mlp1 MLPRegressor 3 0.047530 23878.789059 0.750000
19 1 mlp1 MLPRegressor 4 0.049799 27118.050934 0.680556
20 1 mlp2 MLPRegressor 1 0.059734 32490.520613 0.722222
21 1 mlp2 MLPRegressor 2 0.051235 27731.687214 0.763889
22 1 mlp2 MLPRegressor 3 0.079752 42953.540592 0.652778
23 1 mlp2 MLPRegressor 4 0.050840 27427.795038 0.722222
24 1 mlp3 MLPRegressor 1 0.050637 26258.505875 0.875000
25 1 mlp3 MLPRegressor 2 0.049389 25730.788109 0.750000
26 1 mlp3 MLPRegressor 3 0.051717 27861.066305 0.736111
27 1 mlp3 MLPRegressor 4 0.049857 26739.627054 0.666667

Logging the Results to MLflow

Let’s load the MLflow library and define the experiment name:

import mlflow
import datetime
experiment_name = "ml_forecast_exp01"
mlflow_path = "file:///mlruns"

We will log the backtesting parameters at tag:

tags = {"h": h,
"step_size": step_size,
"partitions": partitions,
"intervals_type": "ConformalIntervals",
"intervals_h": h,
"intervals_n_windows": n_windows,
"intervals_method": "conformal_distribution",
"levels": levels }
try:
    mlflow.create_experiment(name = experiment_name,
                            artifact_location= mlflow_path,
                            tags = tags)
    meta = mlflow.get_experiment_by_name(experiment_name)
    print(f"Set a new experiment {experiment_name}")
    print("Pulling the metadata") 
except: 
    print(f"Experiment {experiment_name} exists, pulling the metadata")
    meta = mlflow.get_experiment_by_name(experiment_name)
Set a new experiment ml_forecast_exp01
Pulling the metadata
run_time = datetime.datetime.now().strftime("%Y-%m-%d %H-%M-%S")
for index, row in score_df.iterrows():
    run_name = row["model_label"] + "-" + run_time 
    with mlflow.start_run(experiment_id = meta.experiment_id, run_name = run_name,
                tags = {"type": "backtesting","partition": row["partition"], 
                "unique_id": row["unique_id"],"model_label": row["model_label"],
                "model_name": row["model_name"],"run_name": run_name}) as run:
        model_params = ml_models[row["model_label"]].get_params() 
        model_params["model_name"] =  row["model_name"]
        model_params["model_label"] =  row["model_label"]
        model_params["partition"] =  row["partition"]
        model_params["lags"] =  lags
        model_params["date_features"] = date_features
        mlflow.log_params(model_params)
        mlflow.log_metric("mape", row["mape"])
        mlflow.log_metric("rmse", row["rmse"])
        mlflow.log_metric("coverage", row["coverage"])
results = mlflow.search_runs(experiment_ids=[meta.experiment_id], order_by=["metrics.mape"])

results.head()
run_id experiment_id status artifact_uri start_time end_time metrics.coverage metrics.mape metrics.rmse params.partition ... tags.partition tags.type tags.mlflow.source.name tags.mlflow.runName tags.run_name tags.unique_id tags.mlflow.source.type tags.mlflow.user tags.model_name tags.model_label
0 3b6c0f1d216a4914a76d0e5c0f90e316 796115560103651146 FINISHED file:///mlruns/3b6c0f1d216a4914a76d0e5c0f90e31... 2025-01-12 20:45:05.917000+00:00 2025-01-12 20:45:06.077000+00:00 0.861111 0.039062 20619.197374 3 ... 3 backtesting /opt/forecasting-poc/lib/python3.10/site-packa... knn-2025-01-12 20-45-03 knn-2025-01-12 20-45-03 1 LOCAL root KNeighborsRegressor knn
1 80302de0c1954afdaf121afb909e7788 796115560103651146 FINISHED file:///mlruns/80302de0c1954afdaf121afb909e778... 2025-01-12 20:45:05.039000+00:00 2025-01-12 20:45:05.200000+00:00 0.680556 0.040188 22320.745610 2 ... 2 backtesting /opt/forecasting-poc/lib/python3.10/site-packa... enet3-2025-01-12 20-45-03 enet3-2025-01-12 20-45-03 1 LOCAL root ElasticNet enet3
2 73611b1476b3406c84ceaf8f8940e851 796115560103651146 FINISHED file:///mlruns/73611b1476b3406c84ceaf8f8940e85... 2025-01-12 20:45:04.300000+00:00 2025-01-12 20:45:04.484000+00:00 0.722222 0.040316 22284.082250 2 ... 2 backtesting /opt/forecasting-poc/lib/python3.10/site-packa... enet2-2025-01-12 20-45-03 enet2-2025-01-12 20-45-03 1 LOCAL root ElasticNet enet2
3 302336a224ff4683a8e174920aa1577c 796115560103651146 FINISHED file:///mlruns/302336a224ff4683a8e174920aa1577... 2025-01-12 20:45:03.560000+00:00 2025-01-12 20:45:03.729000+00:00 0.763889 0.040436 22229.766257 2 ... 2 backtesting /opt/forecasting-poc/lib/python3.10/site-packa... enet1-2025-01-12 20-45-03 enet1-2025-01-12 20-45-03 1 LOCAL root ElasticNet enet1
4 aebebe58489447548066582943d6b64a 796115560103651146 FINISHED file:///mlruns/aebebe58489447548066582943d6b64... 2025-01-12 20:45:03.382000+00:00 2025-01-12 20:45:03.552000+00:00 0.805556 0.046022 25850.656158 1 ... 1 backtesting /opt/forecasting-poc/lib/python3.10/site-packa... enet1-2025-01-12 20-45-03 enet1-2025-01-12 20-45-03 1 LOCAL root ElasticNet enet1

5 rows × 61 columns

Plot error distribution:

import plotly.express as px
fig = px.box(x= results["tags.model_label"], y= 100 * results["metrics.mape"], color= results["tags.model_name"])

# Add jitter
fig.update_traces(boxpoints='all', jitter=0.3, pointpos=-2)

fig.update_layout(
    title = "Models Error Distribution",
    legend_title_text = "Models Family",
    xaxis_title="Model Label",
    yaxis_title="MAPE (%)"
)

Identify best model:

leaderboard = (results.
groupby(["experiment_id", "status", "tags.model_label", "tags.model_name"])["metrics.mape"]
.mean()
.reset_index()
.sort_values(by = "metrics.mape")
)

leaderboard
experiment_id status tags.model_label tags.model_name metrics.mape
0 796115560103651146 FINISHED enet1 ElasticNet 0.047609
1 796115560103651146 FINISHED enet2 ElasticNet 0.047890
2 796115560103651146 FINISHED enet3 ElasticNet 0.048207
4 796115560103651146 FINISHED mlp1 MLPRegressor 0.049627
6 796115560103651146 FINISHED mlp3 MLPRegressor 0.050400
3 796115560103651146 FINISHED knn KNeighborsRegressor 0.055683
5 796115560103651146 FINISHED mlp2 MLPRegressor 0.060390