Backtesting with ML models

Let’s train regression models with different regression models.

Loading the data

Utilies and data libraries

import pandas as pd
import datetime
import json

Load the metadata

raw_json = open("./settings/settings.json")
meta_json = json.load(raw_json)
backtesting_path = meta_json["data"]["backtesting_path"]

Load the dataset and reformat it

ts = pd.read_csv("data/data.csv")
ts["ds"] = pd.to_datetime(ts["ds"])
ts = ts.sort_values("ds")
ts = ts[["unique_id", "ds", "y"]]
ts.head()

	unique_id	ds	y
0	1	2022-11-11 23:00:00	456403
1	1	2022-11-12 00:00:00	458842
2	1	2022-11-12 01:00:00	455111
3	1	2022-11-12 02:00:00	448035
4	1	2022-11-12 03:00:00	438165

Set the data

# os.environ["NIXTLA_ID_AS_COL"] = "1"

from utilsforecast.plotting import plot_series
plot_series(ts, engine = "plotly").update_layout(height=300)

Set the Backtesting Process

In the following example, we will demonstrate how to set a simple backtesting process that train multiple machine learning models.

Let’s start by defining the models:

Regression based on k-nearest neighbors (see model documentation)
Multi-layer Perceptron regressor (see model documentation)
ElasticNet - Linear regression with combined L1 and L2 priors as regularizer (see model documentation)

Note that we will set an initial higher max iteration for the Multi-layer Perceptron and ElasticNet models using the max_iter argument:

from sklearn.linear_model import  ElasticNet
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor



ml_models = {
    "knn": KNeighborsRegressor(),
    "mlp": MLPRegressor(max_iter=1000),
    "enet": ElasticNet(max_iter=1000)

}

We will use the following features:

Lags - lag 1 and 24
Seasonal features - monthly, daily, day of the week, weekly

lags = [1, 24]
date_features = ["month", "day", "dayofweek", "week", "hour"]

Let’s define the forecast object:

from mlforecast import MLForecast
from mlforecast.utils import PredictionIntervals

mlf = MLForecast(
    models=ml_models,
    freq="h",
    lags = lags,
    date_features = date_features
)

Next, we will define the backtesting parameters. We will use a backtesting with four testing partitions, each testing partition with length of 72 hours, and overlapping of 12 hours between each partition. In adddion we will set a 95% prediction intervals using conformal distribution method:

h = 72
step_size = 12
partitions = 4
n_windows = 3
method = "conformal_distribution"
pi = PredictionIntervals(h=h, n_windows = n_windows, method = method)
levels = [95]

Let’s run the backtesting using the cross_validation method:

bkt_df = mlf.cross_validation(
        df = ts,
        step_size= step_size,
        n_windows=partitions,
        prediction_intervals=PredictionIntervals(n_windows=2, h=h),
        level= levels,
        h=h,
        fitted=True,)

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.475e+11, tolerance: 9.972e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.601e+11, tolerance: 9.997e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.552e+11, tolerance: 9.973e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.639e+11, tolerance: 9.998e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.564e+11, tolerance: 9.976e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.636e+11, tolerance: 1.000e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.561e+11, tolerance: 9.976e+09

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.512e+11, tolerance: 1.000e+10

bkt_df.head()

	unique_id	ds	cutoff	y	knn	mlp	enet	knn-lo-95	knn-hi-95	mlp-lo-95	mlp-hi-95	enet-lo-95	enet-hi-95
0	1	2024-11-26 12:00:00	2024-11-26 11:00:00	423022	416257.1875	409043.81250	413452.84375	408580.041406	423934.333594	396099.003906	421988.621094	402097.917187	424807.770313
1	1	2024-11-26 13:00:00	2024-11-26 11:00:00	446189	420906.1875	420555.43750	428518.75000	400963.112500	440849.262500	398524.904687	442585.970313	409926.205469	447111.294531
2	1	2024-11-26 14:00:00	2024-11-26 11:00:00	458839	427260.0000	430777.03125	441908.21875	408422.626563	446097.373437	408585.832813	452968.229687	424869.825781	458946.611719
3	1	2024-11-26 15:00:00	2024-11-26 11:00:00	464379	443909.8125	438137.00000	452244.65625	425663.901562	462155.723438	416472.705469	459801.294531	436616.064063	467873.248437
4	1	2024-11-26 16:00:00	2024-11-26 11:00:00	465544	453961.8125	443568.50000	460505.53125	430425.642188	477497.982812	414669.038281	472467.961719	437231.853125	483779.209375

For convenience reasons, we will map to the partition label their numeric order (as opposed to the timestamp):

::: {#cell-Partition mapping .cell execution_count=12}

cutoff = bkt_df["cutoff"].unique()
partitions_mapping = pd.DataFrame({"cutoff": cutoff, 
    "partition": range(1, len(cutoff) + 1)})

partitions_mapping.head()

	cutoff	partition
0	2024-11-26 11:00:00	1
1	2024-11-26 23:00:00	2
2	2024-11-27 11:00:00	3
3	2024-11-27 23:00:00	4

:::

Let’s merge it to the backtesting table:

bkt_df = bkt_df.merge(partitions_mapping, how = "left", on = ["cutoff"])

Last but not least, let’s plot the results:

from plotly.subplots import make_subplots
import plotly.graph_objects as go

partitions_labels =  bkt_df["partition"].unique()

ts_sub = ts[ts["ds"] > ts["ds"].max() -  datetime.timedelta(hours = 24 * 7)]
fig = make_subplots(rows=partitions, cols=1, subplot_titles= ["Partitions: " + str(i) for i in partitions_labels])


r = 1

for i in partitions_labels:
    if r == 1:
        showlegend = True
    else:
        showlegend = False
    bkt_sub = bkt_df[bkt_df["partition"] == i]
    fig.append_trace(go.Scatter(x= ts_sub["ds"], y=ts_sub["y"], legendgroup = "actual", showlegend = showlegend, mode='lines', name='Actual', line=dict(color='#023047', width=2)), row = r, col = 1)
    fig.append_trace(go.Scatter(x=bkt_sub["ds"], y= bkt_sub["knn"], mode='lines', name='k-nearest neighbors', legendgroup = "knn", showlegend = showlegend, line=dict(color='#2a9d8f', width=1.5, dash = "dash")), row = r, col = 1)
    fig.append_trace(go.Scatter(x=bkt_sub["ds"], y= bkt_sub["mlp"], mode='lines', name='Multi-layer Perceptron',legendgroup = "mlp", showlegend = showlegend, line=dict(color='#0077b6', width=1.5, dash = "dot")), row = r, col = 1)
    fig.append_trace(go.Scatter(x=bkt_sub["ds"], y= bkt_sub["enet"], mode='lines', name='ElasticNet',legendgroup = "enet", showlegend = showlegend, line=dict(color='#ffc8dd', width=1.5, dash = "dot")), row = r, col = 1)
    r = r + 1


fig.update_layout(height=600)
fig.show()

Scoring the models

In this section, we will process the backtesting output and score the models. This includes the following steps: - Transition the backtesting dataframe from wide to long format - Calculate the models performance on each testing partition

We will use the following error metrics to evaluate the models performance: - MAPE - Mean Absolute Percentage Error - RMSE - Root Mean Square Error - Coverage - percentage of actual values that were within the prediction intervals range

Let’s reformat the data, transform the backtesting table - bkt_df from wide to long.

We will use the melt function transition the table into long format, where we assign the transform fields names into new column named model_label and the corresponding values into the value column:

models = list(ml_models.keys()) 
bkt_long = pd.melt(
    bkt_df,
    id_vars=["unique_id", "ds", "partition", "y"],
    value_vars=models + [f"{model}-lo-95" for model in models] \
                      + [f"{model}-hi-95" for model in models],
    var_name="model_label",
    value_name="value",
)


bkt_long.head()

	unique_id	ds	partition	y	model_label	value
0	1	2024-11-26 12:00:00	1	423022	knn	416257.1875
1	1	2024-11-26 13:00:00	1	446189	knn	420906.1875
2	1	2024-11-26 14:00:00	1	458839	knn	427260.0000
3	1	2024-11-26 15:00:00	1	464379	knn	443909.8125
4	1	2024-11-26 16:00:00	1	465544	knn	453961.8125

We will use the following function to relabel the forecast and prediction intervals values into forecast, lower and upper:

::: {#cell-Relabel the PI field .cell execution_count=16}

def split_model_confidence(model_name):
    if "-lo-95" in model_name:
        return model_name.replace("-lo-95", ""), "lower"
    elif "-hi-95" in model_name:
        return model_name.replace("-hi-95", ""), "upper"
    else:
        return model_name, "forecast"

bkt_long["model_label"],\
bkt_long["type"] = zip(*bkt_long["model_label"].map(split_model_confidence))
bkt_long.head()

	unique_id	ds	partition	y	model_label	value	type
0	1	2024-11-26 12:00:00	1	423022	knn	416257.1875	forecast
1	1	2024-11-26 13:00:00	1	446189	knn	420906.1875	forecast
2	1	2024-11-26 14:00:00	1	458839	knn	427260.0000	forecast
3	1	2024-11-26 15:00:00	1	464379	knn	443909.8125	forecast
4	1	2024-11-26 16:00:00	1	465544	knn	453961.8125	forecast

:::

In addition, we will map the model functions name to the labels we created and merge it later with the backtesting table:

model_label = list(ml_models.keys()) 
model_name = [type(s).__name__ for s in list(ml_models.values())] 

models_mapping = pd.DataFrame({"model_label": model_label, "model_name": model_name})

models_mapping

	model_label	model_name
0	knn	KNeighborsRegressor
1	mlp	MLPRegressor
2	enet	ElasticNet

Next, let’s use the pivot function to pivot the type filed into three new fields and merge it with the partitions mapping table:

bkt = (bkt_long
.pivot(index = ["unique_id", "ds", "model_label","partition", "y"], columns = "type", values = "value")
.reset_index()
.merge(models_mapping, how = "left", on = ["model_label"])
)
bkt.head()

	unique_id	ds	model_label	partition	y	forecast	lower	upper	model_name
0	1	2024-11-26 12:00:00	enet	1	423022	413452.84375	402097.917187	424807.770313	ElasticNet
1	1	2024-11-26 12:00:00	knn	1	423022	416257.18750	408580.041406	423934.333594	KNeighborsRegressor
2	1	2024-11-26 12:00:00	mlp	1	423022	409043.81250	396099.003906	421988.621094	MLPRegressor
3	1	2024-11-26 13:00:00	enet	1	446189	428518.75000	409926.205469	447111.294531	ElasticNet
4	1	2024-11-26 13:00:00	knn	1	446189	420906.18750	400963.112500	440849.262500	KNeighborsRegressor

Now we can score the models results using the following helpers functions:

from statistics import mean
def mape(y, yhat):
    mape = mean(abs(y - yhat)/ y) 
    return mape

def rmse(y, yhat):
    rmse = (mean((y - yhat) ** 2 )) ** 0.5
    return rmse

def coverage(y, lower, upper):
    coverage = sum((y <= upper) & (y >= lower)) / len(y)
    return coverage

def score(df):
    mape_score = mape(y = df["y"], yhat = df["forecast"])
    rmse_score = rmse(y = df["y"], yhat = df["forecast"])
    coverage_score = coverage(y = df["y"], lower = df["lower"], upper = df["upper"])
    cols = ["mape","rmse", "coverage"]
    d = pd.Series([mape_score, rmse_score,  coverage_score], index=cols)

    return d

We will group by backtesting table by the series unique id, model label and partition and calculate its score:

score_df = (bkt
.groupby(["unique_id", "model_label", "model_name", "partition"])[["unique_id", "model_label", "model_name", "partition", "y", "forecast", "lower", "upper"]]
.apply(score)
.reset_index()
)

score_df

	unique_id	model_label	model_name	partition	mape	rmse	coverage
0	1	enet	ElasticNet	1	0.046387	26248.918613	0.791667
1	1	enet	ElasticNet	2	0.040316	22284.199211	0.722222
2	1	enet	ElasticNet	3	0.048238	25473.070455	0.541667
3	1	enet	ElasticNet	4	0.056620	29758.941497	0.597222
4	1	knn	KNeighborsRegressor	1	0.071400	38516.048318	0.708333
5	1	knn	KNeighborsRegressor	2	0.065884	36684.848083	0.597222
6	1	knn	KNeighborsRegressor	3	0.039062	20619.197374	0.861111
7	1	knn	KNeighborsRegressor	4	0.046386	24855.016077	0.763889
8	1	mlp	MLPRegressor	1	0.051659	26748.692236	0.875000
9	1	mlp	MLPRegressor	2	0.049206	25506.360560	0.847222
10	1	mlp	MLPRegressor	3	0.042411	22529.148592	0.763889
11	1	mlp	MLPRegressor	4	0.051685	27958.964509	0.722222

Creating a leaderboard table:

leaderboard = (bkt
.groupby(["unique_id", "model_label", "model_name"])[["unique_id", "model_label", "model_name", "partition", "y", "forecast", "lower", "upper"]]
.apply(score)
.reset_index()
.sort_values(by = "mape")
)

leaderboard

	unique_id	model_label	model_name	mape	rmse	coverage
0	1	enet	ElasticNet	0.047890	26077.112476	0.663194
2	1	mlp	MLPRegressor	0.048740	25764.962634	0.802083
1	1	knn	KNeighborsRegressor	0.055683	31113.450393	0.732639

Experimentation

Let’s now generalized the previous steps and set up an experimentation. The mean goal is to identify which model perform best. This includes identify the tuning parameters and features yield the best performance.

Let’s take the three models we used before and try different tuning parameters: - For the Multi-layer Perceptron regressor we will test different hidden layer size setting - For the ElasticNet model we will test different l1 ratio which defines the ration between L1 and L2 penalty

ml_models = {
    "knn": KNeighborsRegressor(),
    "mlp1": MLPRegressor(max_iter=2000, hidden_layer_sizes = (100,)),
    "mlp2": MLPRegressor(max_iter=2000, hidden_layer_sizes = (50,)),
    "mlp3": MLPRegressor(max_iter=2000, hidden_layer_sizes = (200,)),
    "enet1": ElasticNet(max_iter=2000, l1_ratio = 0, tol=0.001),
    "enet2": ElasticNet(max_iter=2000, l1_ratio = 0.5, tol=0.001),
    "enet3": ElasticNet(max_iter=2000, l1_ratio = 1, tol=0.001),

}

We will use the same features settings as before:

lags = [1, 24]
date_features = ["month", "day", "dayofweek", "week", "hour"]

And the same backtesting settings:

h = 72
step_size = 12
partitions = 4
n_windows = 3
method = "conformal_distribution"
pi = PredictionIntervals(h=h, n_windows = n_windows, method = method)
levels = [95]

Let’s set the forecasting object:

mlf = MLForecast(
    models=ml_models,
    freq="h",
    lags = lags,
    date_features= date_features
)

And apply the backtesting:

bkt_df = mlf.cross_validation(
        df = ts,
        step_size= step_size,
        n_windows=partitions,
        prediction_intervals=PredictionIntervals(n_windows=2, h=h),
        level= levels,
        h=h,
        fitted=True,)

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.717e+11, tolerance: 9.972e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.518e+11, tolerance: 9.972e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.344e+11, tolerance: 9.972e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.781e+11, tolerance: 9.997e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.616e+11, tolerance: 9.997e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.720e+11, tolerance: 9.973e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.560e+11, tolerance: 9.973e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.395e+11, tolerance: 9.973e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.785e+11, tolerance: 9.998e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.635e+11, tolerance: 9.998e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.458e+11, tolerance: 9.998e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.724e+11, tolerance: 9.976e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.571e+11, tolerance: 9.976e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.341e+11, tolerance: 9.976e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.787e+11, tolerance: 1.000e+11 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.613e+11, tolerance: 1.000e+11

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.426e+11, tolerance: 1.000e+11

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.743e+11, tolerance: 9.976e+10 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.573e+11, tolerance: 9.976e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.337e+11, tolerance: 9.976e+10

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.790e+11, tolerance: 1.000e+11 Linear regression models with null weight for the l1 regularization term are more efficiently fitted using one of the solvers implemented in sklearn.linear_model.Ridge/RidgeCV instead.

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.549e+11, tolerance: 1.000e+11

/opt/forecasting-poc/lib/python3.10/site-packages/sklearn/linear_model/_coordinate_descent.py:697: ConvergenceWarning:

Objective did not converge. You might want to increase the number of iterations, check the scale of the features or consider increasing regularisation. Duality gap: 9.474e+11, tolerance: 1.000e+11

We will use the following function to transform the backtesting object from wide to long:

def bkt_wide_2_long(bkt, models):
    # Mapping the models labels
    model_labels = list(models.keys())
    model_name = [type(s).__name__ for s in models.values()] 
    models_mapping = pd.DataFrame({"model_label": model_labels, "model_name": model_name})
    # Mapping the partitions
    cutoff = bkt["cutoff"].unique()
    partitions_mapping = pd.DataFrame({"cutoff": cutoff, 
    "partition": range(1, len(cutoff) + 1)})
    bkt = bkt.merge(partitions_mapping, how = "left", on = ["cutoff"])
    # Melting the bkt object to long
    bkt_long = pd.melt(
        bkt,
        id_vars=["unique_id", "ds", "partition", "y"],
        value_vars=model_labels + [f"{model}-lo-95" for model in model_labels] \
                          + [f"{model}-hi-95" for model in model_labels],
        var_name="model_label",
        value_name="value",
    )

    bkt_long["model_label"],bkt_long["type"] = zip(*bkt_long["model_label"].map(split_model_confidence))
    
    bkt = (bkt_long
            .pivot(index = ["unique_id", "ds", "model_label","partition", "y"], columns = "type", values = "value")
            .reset_index()
            .merge(models_mapping, how = "left", on = ["model_label"])
            )

    return bkt

bkt = bkt_wide_2_long(bkt = bkt_df, models = ml_models)

bkt.head()

	unique_id	ds	model_label	partition	y	forecast	lower	upper	model_name
0	1	2024-11-26 12:00:00	enet1	1	423022	412914.00000	401216.973438	424611.026562	ElasticNet
1	1	2024-11-26 12:00:00	enet2	1	423022	413452.84375	402097.917187	424807.770313	ElasticNet
2	1	2024-11-26 12:00:00	enet3	1	423022	414156.62500	403249.959375	425063.290625	ElasticNet
3	1	2024-11-26 12:00:00	knn	1	423022	416257.18750	408580.041406	423934.333594	KNeighborsRegressor
4	1	2024-11-26 12:00:00	mlp1	1	423022	409429.62500	396350.746875	422508.503125	MLPRegressor

score_df = (bkt
.groupby(["unique_id", "model_label", "model_name", "partition"])[["unique_id", "model_label", "model_name", "partition", "y", "forecast", "lower", "upper"]]
.apply(score)
.reset_index()
)

score_df

	unique_id	model_label	model_name	partition	mape	rmse	coverage
0	1	enet1	ElasticNet	1	0.046022	25850.656158	0.805556
1	1	enet1	ElasticNet	2	0.040436	22229.766257	0.763889
2	1	enet1	ElasticNet	3	0.047834	25163.856352	0.569444
3	1	enet1	ElasticNet	4	0.056144	29338.262505	0.569444
4	1	enet2	ElasticNet	1	0.046387	26248.881182	0.791667
5	1	enet2	ElasticNet	2	0.040316	22284.082250	0.722222
6	1	enet2	ElasticNet	3	0.048238	25473.073322	0.541667
7	1	enet2	ElasticNet	4	0.056620	29758.980748	0.597222
8	1	enet3	ElasticNet	1	0.046769	26734.650723	0.750000
9	1	enet3	ElasticNet	2	0.040188	22320.745610	0.680556
10	1	enet3	ElasticNet	3	0.048707	25853.755042	0.555556
11	1	enet3	ElasticNet	4	0.057163	30287.218437	0.569444
12	1	knn	KNeighborsRegressor	1	0.071400	38516.048318	0.708333
13	1	knn	KNeighborsRegressor	2	0.065884	36684.848083	0.597222
14	1	knn	KNeighborsRegressor	3	0.039062	20619.197374	0.861111
15	1	knn	KNeighborsRegressor	4	0.046386	24855.016077	0.763889
16	1	mlp1	MLPRegressor	1	0.052514	27242.245521	0.861111
17	1	mlp1	MLPRegressor	2	0.048664	24604.639462	0.847222
18	1	mlp1	MLPRegressor	3	0.047530	23878.789059	0.750000
19	1	mlp1	MLPRegressor	4	0.049799	27118.050934	0.680556
20	1	mlp2	MLPRegressor	1	0.059734	32490.520613	0.722222
21	1	mlp2	MLPRegressor	2	0.051235	27731.687214	0.763889
22	1	mlp2	MLPRegressor	3	0.079752	42953.540592	0.652778
23	1	mlp2	MLPRegressor	4	0.050840	27427.795038	0.722222
24	1	mlp3	MLPRegressor	1	0.050637	26258.505875	0.875000
25	1	mlp3	MLPRegressor	2	0.049389	25730.788109	0.750000
26	1	mlp3	MLPRegressor	3	0.051717	27861.066305	0.736111
27	1	mlp3	MLPRegressor	4	0.049857	26739.627054	0.666667

Logging the Results to MLflow

Let’s load the MLflow library and define the experiment name:

import mlflow
import datetime
experiment_name = "ml_forecast_exp01"
mlflow_path = "file:///mlruns"

We will log the backtesting parameters at tag:

tags = {"h": h,
"step_size": step_size,
"partitions": partitions,
"intervals_type": "ConformalIntervals",
"intervals_h": h,
"intervals_n_windows": n_windows,
"intervals_method": "conformal_distribution",
"levels": levels }

try:
    mlflow.create_experiment(name = experiment_name,
                            artifact_location= mlflow_path,
                            tags = tags)
    meta = mlflow.get_experiment_by_name(experiment_name)
    print(f"Set a new experiment {experiment_name}")
    print("Pulling the metadata") 
except: 
    print(f"Experiment {experiment_name} exists, pulling the metadata")
    meta = mlflow.get_experiment_by_name(experiment_name)

Set a new experiment ml_forecast_exp01
Pulling the metadata

run_time = datetime.datetime.now().strftime("%Y-%m-%d %H-%M-%S")

for index, row in score_df.iterrows():
    run_name = row["model_label"] + "-" + run_time 
    with mlflow.start_run(experiment_id = meta.experiment_id, run_name = run_name,
                tags = {"type": "backtesting","partition": row["partition"], 
                "unique_id": row["unique_id"],"model_label": row["model_label"],
                "model_name": row["model_name"],"run_name": run_name}) as run:
        model_params = ml_models[row["model_label"]].get_params() 
        model_params["model_name"] =  row["model_name"]
        model_params["model_label"] =  row["model_label"]
        model_params["partition"] =  row["partition"]
        model_params["lags"] =  lags
        model_params["date_features"] = date_features
        mlflow.log_params(model_params)
        mlflow.log_metric("mape", row["mape"])
        mlflow.log_metric("rmse", row["rmse"])
        mlflow.log_metric("coverage", row["coverage"])

results = mlflow.search_runs(experiment_ids=[meta.experiment_id], order_by=["metrics.mape"])

results.head()

	run_id	experiment_id	status	artifact_uri	start_time	end_time	metrics.coverage	metrics.mape	metrics.rmse	params.partition	...	tags.partition	tags.type	tags.mlflow.source.name	tags.mlflow.runName	tags.run_name	tags.unique_id	tags.mlflow.source.type	tags.mlflow.user	tags.model_name	tags.model_label
0	3b6c0f1d216a4914a76d0e5c0f90e316	796115560103651146	FINISHED	file:///mlruns/3b6c0f1d216a4914a76d0e5c0f90e31...	2025-01-12 20:45:05.917000+00:00	2025-01-12 20:45:06.077000+00:00	0.861111	0.039062	20619.197374	3	...	3	backtesting	/opt/forecasting-poc/lib/python3.10/site-packa...	knn-2025-01-12 20-45-03	knn-2025-01-12 20-45-03	1	LOCAL	root	KNeighborsRegressor	knn
1	80302de0c1954afdaf121afb909e7788	796115560103651146	FINISHED	file:///mlruns/80302de0c1954afdaf121afb909e778...	2025-01-12 20:45:05.039000+00:00	2025-01-12 20:45:05.200000+00:00	0.680556	0.040188	22320.745610	2	...	2	backtesting	/opt/forecasting-poc/lib/python3.10/site-packa...	enet3-2025-01-12 20-45-03	enet3-2025-01-12 20-45-03	1	LOCAL	root	ElasticNet	enet3
2	73611b1476b3406c84ceaf8f8940e851	796115560103651146	FINISHED	file:///mlruns/73611b1476b3406c84ceaf8f8940e85...	2025-01-12 20:45:04.300000+00:00	2025-01-12 20:45:04.484000+00:00	0.722222	0.040316	22284.082250	2	...	2	backtesting	/opt/forecasting-poc/lib/python3.10/site-packa...	enet2-2025-01-12 20-45-03	enet2-2025-01-12 20-45-03	1	LOCAL	root	ElasticNet	enet2
3	302336a224ff4683a8e174920aa1577c	796115560103651146	FINISHED	file:///mlruns/302336a224ff4683a8e174920aa1577...	2025-01-12 20:45:03.560000+00:00	2025-01-12 20:45:03.729000+00:00	0.763889	0.040436	22229.766257	2	...	2	backtesting	/opt/forecasting-poc/lib/python3.10/site-packa...	enet1-2025-01-12 20-45-03	enet1-2025-01-12 20-45-03	1	LOCAL	root	ElasticNet	enet1
4	aebebe58489447548066582943d6b64a	796115560103651146	FINISHED	file:///mlruns/aebebe58489447548066582943d6b64...	2025-01-12 20:45:03.382000+00:00	2025-01-12 20:45:03.552000+00:00	0.805556	0.046022	25850.656158	1	...	1	backtesting	/opt/forecasting-poc/lib/python3.10/site-packa...	enet1-2025-01-12 20-45-03	enet1-2025-01-12 20-45-03	1	LOCAL	root	ElasticNet	enet1

5 rows × 61 columns

Plot error distribution:

import plotly.express as px
fig = px.box(x= results["tags.model_label"], y= 100 * results["metrics.mape"], color= results["tags.model_name"])

# Add jitter
fig.update_traces(boxpoints='all', jitter=0.3, pointpos=-2)

fig.update_layout(
    title = "Models Error Distribution",
    legend_title_text = "Models Family",
    xaxis_title="Model Label",
    yaxis_title="MAPE (%)"
)

Identify best model:

leaderboard = (results.
groupby(["experiment_id", "status", "tags.model_label", "tags.model_name"])["metrics.mape"]
.mean()
.reset_index()
.sort_values(by = "metrics.mape")
)

leaderboard

	experiment_id	status	tags.model_label	tags.model_name	metrics.mape
0	796115560103651146	FINISHED	enet1	ElasticNet	0.047609
1	796115560103651146	FINISHED	enet2	ElasticNet	0.047890
2	796115560103651146	FINISHED	enet3	ElasticNet	0.048207
4	796115560103651146	FINISHED	mlp1	MLPRegressor	0.049627
6	796115560103651146	FINISHED	mlp3	MLPRegressor	0.050400
3	796115560103651146	FINISHED	knn	KNeighborsRegressor	0.055683
5	796115560103651146	FINISHED	mlp2	MLPRegressor	0.060390