Modeling Trend

Author

Rami Krispin

Published

December 1, 2025

After reviewing the core methods for analyzing time series patterns using decomposition, correlation, and seasonal plots, in this notebook we will build our first forecasting model, focusing on modeling trend. To simplify the process, we will start with a time series that has no seasonal components or other complex patterns. Later, we will extend these ideas to series that include both trend and seasonality, as well as those affected by outliers.

Modeling Linear Trend

Before diving into the modeling process, let’s start with the definition of trend:

A trend in a time series is the long-term, systematic movement or direction in the data over time. It reflects persistent increases or decreases in the series that are not due to short-term fluctuations, randomness, or seasonal patterns. Trends can be linear—changing at a constant rate—or nonlinear, accelerating or decelerating as time progresses. Identifying the trend helps isolate the underlying structural behavior of the series and forms the foundation for many forecasting models, especially those that assume the future will continue to follow the same long-term direction observed in the past.

Trends can take linear or nonlinear forms. A linear trend has a constant rate of change, meaning that the series grows (or declines) by the same amount at each time step. From a mathematical perspective, a linear trend’s slope represents the marginal change in the series with respect to time.

Let’s load the California number of natural gas consumers and start building a forecasting model.

Setup

Let’s start by loading the required libraries and source the workshop functions:

library(feasts)
library(fabletools)
library(tsibble)
library(dplyr)
library(plotly)

source("./functions.R")

Next, let’s load the datasets:

load(file = "./data/ts.RData")

And plot it:

ts1 <- ts1 |>
    dplyr::filter(index > 1986)


p <- plot_ly() |>
    add_lines(x = ts1$index, y = ts1$y, type = "scatter", mode = "lines", name = "Actual") |>
    plotly::layout(
        title = "Number of Natural Gas Consumers in California",
        yaxis = list(title = "Number of Consumers"),
        xaxis = list(title = "Period")
    )

p

Modeling a Simple Trend

As mentioned above, trend is equivalent (when having linear trend) to the marginal change in the series over time. Therefore, to model the trend we can regress the series against the series index:

\[ Y \textasciitilde Index \]

Where the Index is simply a vector if numbers that increase as a function of time:

\[ Index = [1,2,3,...N], ~for ~a ~series ~with ~N ~observations \]

Let’s add to the series an trend feature:

ts1$trend <- 1:nrow(ts1)

ts1 |> head()
# A tsibble: 6 x 4 [1Y]
  index       y series_id trend
  <int>   <int> <chr>     <int>
1  1987 7904858 SCA           1
2  1988 8113034 SCA           2
3  1989 8313776 SCA           3
4  1990 8497848 SCA           4
5  1991 8634774 SCA           5
6  1992 8680613 SCA           6

Let’s now set a regression model:

md1 <- lm(y ~ trend, data = ts1)

summary(md1)

Call:
lm(formula = y ~ trend, data = ts1)

Residuals:
    Min      1Q  Median      3Q     Max 
-312554 -102254  -20905  110378  341535 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  8123398      55342  146.79   <2e-16 ***
trend          94014       2474   38.01   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 167200 on 36 degrees of freedom
Multiple R-squared:  0.9757,    Adjusted R-squared:  0.975 
F-statistic:  1444 on 1 and 36 DF,  p-value: < 2.2e-16

We will fit the model on the series to see the trend fit:

fit1 <- predict(object = md1, newdata = ts1, interval = "confidence", level = 0.95)
ts1$fit1 <- fit1[, 1]
p1 <- p |>
    add_lines(x = ts1$index, y = ts1$fit1, mode = "lines", line = list(color = "black", dash = "dash"), name = "Fitted")

p1

Residuals Analysis

Once a linear regression model is fitted to a time series, residual analysis is essential for verifying whether the model has adequately captured the underlying structure of the data. The goal is to ensure that the remaining errors—what the model could not explain—behave like white noise: uncorrelated, constant-variance, zero-mean noise with no remaining pattern. If the residuals are not white noise, the model is missing structure such as trend, seasonality, or autocorrelation, and the forecasts may be biased or unreliable.

To evaluate these assumptions, we use a set of diagnostic plots:

Actual vs Fitted Plot

This plot compares the original series to the model’s fitted values.

  • Goal: Detect whether the model captured the main trend (and seasonality, if included).
  • What to look for:
    • The fitted line should track the general shape of the actual data.
    • Systematic deviations—such as fitted values consistently lagging behind the actual series—suggest underfitting or missing components.
  • Common issues revealed: Remaining trend, seasonal patterns, or structural changes.

Residuals ACF Plot

The autocorrelation function (ACF) of residuals checks whether errors are correlated over time.

  • Goal: Verify the assumption of no autocorrelation in residuals.
  • What to look for:
    • All autocorrelations should fall within the confidence bands (no significant spikes).
    • Significant autocorrelation indicates that: - the model did not fully capture the time dependence, or - additional lagged terms, ARIMA errors, or differencing may be needed.
  • Why it matters: Autocorrelated errors imply that the model’s predictions could be improved by modeling the remaining serial dependence.

Residual Distribution Plots (Density and Q-Q Plots)

These plots assess whether residuals behave like noise with constant variance and approximately normal distribution.

  • Density Plot:
    • Shows the empirical distribution of residuals.
    • Goal: Check for skewness, heavy tails, or multimodality.
    • What to look for: A roughly symmetric, bell-shaped curve.
  • Q-Q Plot:
    • Compares the residuals to a theoretical normal distribution.
    • Goal: Assess normality and identify outliers.
    • What to look for: Points lying close to the diagonal line.
    • Common issues revealed:
      • Heavy tails → potential outliers
      • Systematic curvature → non-normal residuals or model misspecification

Let’s start by calculating the model residuals:

ts1$res1 <- ts1$y - ts1$fit1

plot_ly(x = ts1$index, y = ts1$res1, type = "scatter", mode = "markers")

If would be nice to add the standard deviation of the residuals:

# Calculate residual standard deviation
sd_res1 <- sd(ts1$res1)

plot_ly(x = ts1$index, y = ts1$res1, type = "scatter", mode = "markers", name = "Residuals") |>
    # Add +2SD line
    add_segments(
        x = min(ts1$index), xend = max(ts1$index),
        y = 2 * sd_res1, yend = 2 * sd_res1,
        line = list(color = "orange", dash = "dash", width = 2),
        name = "+2SD",
        showlegend = TRUE
    ) |>
    # Add -2SD line
    add_segments(
        x = min(ts1$index), xend = max(ts1$index),
        y = -2 * sd_res1, yend = -2 * sd_res1,
        line = list(color = "orange", dash = "dash", width = 2),
        name = "-2SD",
        showlegend = FALSE,
        legendgroup = "2SD"
    ) |>
    # Add +3SD line
    add_segments(
        x = min(ts1$index), xend = max(ts1$index),
        y = 3 * sd_res1, yend = 3 * sd_res1,
        line = list(color = "red", dash = "dash", width = 2),
        name = "+3SD",
        showlegend = TRUE
    ) |>
    # Add -3SD line
    add_segments(
        x = min(ts1$index), xend = max(ts1$index),
        y = -3 * sd_res1, yend = -3 * sd_res1,
        line = list(color = "red", dash = "dash", width = 2),
        name = "-3SD",
        showlegend = FALSE,
        legendgroup = "3SD"
    ) |>
    plotly::layout(
        yaxis = list(title = "Residuals"),
        xaxis = list(title = "Index")
    )

Clearly, you can see that some patterns are left in the residuals.

We can check for autocorrelation:

plot_acf(ts = ts1, var = "res1", lag_max = 60, frequency = NULL, alpha = 0.05)

The ACF plot indicates that the residuals have a high correlation with the first lags. This is classic case, that adding an ARIMA model (i.e., regression with ARIMA errors) could improve the fit of the model.

The last component that we want to check is the distribution of the residual:

Residual Distribution

Let’s create a density plot to evaluate if the residuals are normally distributed:

# Calculate density
density_res <- density(ts1$res1)

# Create density plot
p_density <- plot_ly() |>
    add_trace(
        x = density_res$x,
        y = density_res$y,
        type = "scatter",
        mode = "lines",
        fill = "tozeroy",
        fillcolor = "rgba(0, 114, 181, 0.3)",
        line = list(color = "#0072B5", width = 2),
        name = "Residual Density"
    ) |>
    plotly::layout(
        title = "Distribution of Residuals",
        xaxis = list(title = "Residuals"),
        yaxis = list(title = "Density"),
        showlegend = TRUE
    )

# Overlay normal distribution for comparison
mean_res <- mean(ts1$res1)
sd_res <- sd(ts1$res1)
x_norm <- seq(min(ts1$res1), max(ts1$res1), length.out = 100)
y_norm <- dnorm(x_norm, mean = mean_res, sd = sd_res)

p_density <- p_density |>
    add_trace(
        x = x_norm,
        y = y_norm,
        type = "scatter",
        mode = "lines",
        line = list(color = "red", dash = "dash", width = 2),
        name = "Normal Distribution"
    )

p_density

The density plot shows the distribution of residuals (blue) overlaid with a theoretical normal distribution (red dashed line) with the same mean and standard deviation. If the residuals were perfectly normally distributed, the two curves would align closely.

Q-Q Plot

A Q-Q (quantile-quantile) plot provides another way to assess normality by comparing the quantiles of the residuals against the quantiles of a theoretical normal distribution:

# Calculate theoretical quantiles
n <- length(ts1$res1)
theoretical_quantiles <- qnorm(ppoints(n))

# Calculate sample quantiles (sorted residuals)
sample_quantiles <- sort(ts1$res1)

# Standardize the residuals for comparison
standardized_res <- (sample_quantiles - mean(ts1$res1)) / sd(ts1$res1)

# Create Q-Q plot
p_qq <- plot_ly() |>
    add_trace(
        x = theoretical_quantiles,
        y = standardized_res,
        type = "scatter",
        mode = "markers",
        marker = list(color = "#0072B5", size = 6, opacity = 0.6),
        name = "Sample Quantiles"
    ) |>
    add_trace(
        x = theoretical_quantiles,
        y = theoretical_quantiles,
        type = "scatter",
        mode = "lines",
        line = list(color = "red", dash = "dash", width = 2),
        name = "Normal Reference Line"
    ) |>
    plotly::layout(
        title = "Q-Q Plot: Residuals vs Normal Distribution",
        xaxis = list(title = "Theoretical Quantiles"),
        yaxis = list(title = "Sample Quantiles (Standardized)"),
        showlegend = TRUE
    )

p_qq

In a Q-Q plot, if the residuals follow a normal distribution, the points should fall approximately along the red reference line. Deviations from this line indicate departures from normality. Points curving away at the tails suggest heavy-tailed or skewed distributions.

The plot_residuals function returns all the plots together:

plot_residuals(
    data = ts1,
    index_col = "index",
    actual_col = "y",
    fitted_col = "fit1",
    lag_max = 60,
    alpha = 0.05
)

We can noticed from the residuals and ACF plots that the model goodness of fit improved with the addition of a second trend feature. Yet, the ACF plot indicate that some correlation left on the residuals within the first lag. An AR model with order 1 can help to remove this correlation.

Create a Forecast

Once we have trained the model on our series, we can use the model object to forecast the series’ future. The main requirement is to create the future data frame with the same features we used to train the model.

In the case of the above model, we need to create a future data frame with the continuous elements of the trend feature. Let’s assume we want to predict the next 5 years:

h <- 5

Next, we use the new_data function from the tsibble library to create a continues data frame with the series future index:

future_data <- new_data(ts1, n = h)
future_data
# A tsibble: 5 x 1 [1Y]
  index
  <dbl>
1  2025
2  2026
3  2027
4  2028
5  2029

Let’s now add the trend:

tail(ts1)
# A tsibble: 6 x 6 [1Y]
  index        y series_id trend      fit1     res1
  <int>    <int> <chr>     <int>     <dbl>    <dbl>
1  2019 11112094 SCA          33 11225849. -113755.
2  2020 11186350 SCA          34 11319862. -133512.
3  2021 11232552 SCA          35 11413876. -181324.
4  2022 11280329 SCA          36 11507890. -227561.
5  2023 11359812 SCA          37 11601903. -242091.
6  2024 11428636 SCA          38 11695917. -267281.
trend_start <- max(ts1$trend) + 1
future_data$trend <- trend_start:(trend_start + h - 1)
future_data
# A tsibble: 5 x 2 [1Y]
  index trend
  <dbl> <int>
1  2025    39
2  2026    40
3  2027    41
4  2028    42
5  2029    43

Let’s now use the predict.lm function to generate the forecast:

fc1 <- predict(
    object = md1,
    newdata = future_data,
    interval = "prediction",
    level = 0.95
)

future_data$yhat <- fc1[, 1]
future_data$lower <- fc1[, 2]
future_data$upper <- fc1[, 3]
future_data
# A tsibble: 5 x 5 [1Y]
  index trend      yhat     lower     upper
  <dbl> <int>     <dbl>     <dbl>     <dbl>
1  2025    39 11789931. 11432704. 12147157.
2  2026    40 11883944. 11525312. 12242577.
3  2027    41 11977958. 11617854. 12338061.
4  2028    42 12071971. 11710334. 12433609.
5  2029    43 12165985. 11802750. 12529220.

Let’s plot the forecast

p1 |>
    add_ribbons(
        x = future_data$index, ymin = future_data$lower,
        ymax = future_data$upper, name = "95% Prediction Interval",
        line = list(color = "rgba(7, 164, 181, 0.05)"),
        fillcolor = "rgba(7, 164, 181, 0.2)"
    ) |>
    add_lines(
        x = future_data$index,
        y = future_data$yhat,
        name = "Forecast",
        line = list(color = "black", dash = "dash")
    )

That does not look like a great model…🥴

Improving the model

Reviewing the results, there are two features we can use to improve the forecast:

  • Piecewise linear trend
  • Adding AR model
  • Combing both approaches

Piecewise Linear Trend

Eyeballing the series, you can notice a change in the trend at around 2008. We will use this information to create a piecewise linear trend.

s <- ts1$trend[which(ts1$index == 2008)]
ts1$trend2 <- pmax(0, ts1$trend - s)

md2 <- lm(y ~ trend + trend2, data = ts1)

summary(md2)

Call:
lm(formula = y ~ trend + trend2, data = ts1)

Residuals:
    Min      1Q  Median      3Q     Max 
-144207  -28626   -5966   20935  165086 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  7888957      30324  260.15  < 2e-16 ***
trend         116146       2060   56.38  < 2e-16 ***
trend2        -55084       4383  -12.57 1.57e-14 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 72230 on 35 degrees of freedom
Multiple R-squared:  0.9956,    Adjusted R-squared:  0.9953 
F-statistic:  3949 on 2 and 35 DF,  p-value: < 2.2e-16

Let’s fit the model and plot it:

fit2 <- predict(object = md2, newdata = ts1, interval = "confidence", level = 0.95)
ts1$fit2 <- fit2[, 1]
p2 <- p |>
    add_lines(x = ts1$index, y = ts1$fit2, mode = "lines", line = list(color = "black", dash = "dash"), name = "Fitted")
p2

We will add to the plot the features:

trend1 <- plot_ly() |>
    add_lines(x = ts1$index, y = ts1$trend, name = "Main Trend Feature")

trend2 <- plot_ly() |>
    add_lines(x = ts1$index, y = ts1$trend2, name = "Piecewise Trend Feature")
subplot(p2, trend1, trend2, nrows = 3)

Let’s check the model residuals:

plot_residuals(
    data = ts1,
    index_col = "index",
    actual_col = "y",
    fitted_col = "fit2",
    lag_max = 60,
    alpha = 0.05
)

Let’s update the future data frame with the new feature and create a forecast:

trend_start2 <- max(ts1$trend2) + 1
future_data$trend2 <- trend_start2:(trend_start2 + h - 1)
future_data
# A tsibble: 5 x 6 [1Y]
  index trend      yhat     lower     upper trend2
  <dbl> <int>     <dbl>     <dbl>     <dbl>  <int>
1  2025    39 11789931. 11432704. 12147157.     17
2  2026    40 11883944. 11525312. 12242577.     18
3  2027    41 11977958. 11617854. 12338061.     19
4  2028    42 12071971. 11710334. 12433609.     20
5  2029    43 12165985. 11802750. 12529220.     21
fc2 <- lm_forecast(
    model = md2,
    actual = ts1,
    future_data = future_data,
    actual_col = "y",
    lag_col = NULL,
    level = 0.95
)

Let’s plot the forecast

plot_lm_forecast(fc2, actual_col = "y")

Looks much better! 😎

Using Grid Search to Identify the Optimal Knots

result <- piecewise_regression(
    data = ts1,
    time_col = "index",
    value_col = "y",
    max_knots = 3,
    min_segment_length = 8,
    edge_buffer = 0.05,
    grid_resolution = 20
)
Testing 0 knot(s)...
  Best BIC: 919.28 | RSS: 1.006639e+12 | Tested 1 configurations
Testing 1 knot(s)...
  Best BIC: 858.05 | RSS: 182625404855 | Tested 18 configurations
Testing 2 knot(s)...
  Best BIC: 844.26 | RSS: 115452860424 | Tested 25 configurations
Testing 3 knot(s)...
  Best BIC: 852.94 | RSS: 131838198802 | Tested 5 configurations

Optimal model:  2 knot(s) with BIC = 844.26 
plot_bic_scores(result)
plot_knots(result, time_col = "index", value_col = "y")
# record <- record_piecewise_search(
#     data = ts1,
#     time_col = "index",
#     value_col = "y",
#     max_knots = 3,
#     min_segment_length = 8,
#     edge_buffer = 0.05,
#     grid_resolution = 25,
#     output_dir = "animation_frames",
#     dpi = 100,
#     gif_width = 800
# )
# browseURL(record$gif_path)

AR model

The Auto Regressive (AR) model is a common approach in time series forecasting to model time series with high correlation with its past lags. The AR model is part of a family of models that construct the ARIMA family of models.

At its core, AR model is a regression of the series with its previous lags. For example, AR 1 model would be a regression of the series with its lag one:

\[ Y_t = \beta_0 + \beta_1 * Y_{t-1} \]

Let’s update the above model formula:

\[ Y_t = \beta_0 + \beta_1 * trend + \beta_2 * Y_{t-1} \]

And create a lag 1 feature:

ts1$lag1 <- dplyr::lag(ts1$y, n = 1)

ts1
# A tsibble: 38 x 9 [1Y]
   index       y series_id trend     fit1     res1 trend2     fit2    lag1
   <int>   <int> <chr>     <int>    <dbl>    <dbl>  <dbl>    <dbl>   <int>
 1  1987 7904858 SCA           1 8217412. -312554.      0 8005104.      NA
 2  1988 8113034 SCA           2 8311426. -198392.      0 8121250. 7904858
 3  1989 8313776 SCA           3 8405439.  -91663.      0 8237396. 8113034
 4  1990 8497848 SCA           4 8499453.   -1605.      0 8353542. 8313776
 5  1991 8634774 SCA           5 8593467.   41307.      0 8469688. 8497848
 6  1992 8680613 SCA           6 8687480.   -6867.      0 8585835. 8634774
 7  1993 8726187 SCA           7 8781494.  -55307.      0 8701981. 8680613
 8  1994 8790733 SCA           8 8875508.  -84775.      0 8818127. 8726187
 9  1995 8865541 SCA           9 8969521. -103980.      0 8934273. 8790733
10  1996 8969308 SCA          10 9063535.  -94227.      0 9050419. 8865541
# ℹ 28 more rows

Let’s now fit the model:

md3 <- lm(y ~ trend + lag1, data = ts1)

fit3 <- predict(object = md3, newdata = ts1, interval = "confidence", level = 0.95)
ts1$fit3 <- fit3[, 1]

summary(md3)

Call:
lm(formula = y ~ trend + lag1, data = ts1)

Residuals:
    Min      1Q  Median      3Q     Max 
-114148  -35769    3264   30177  126480 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 8.009e+05  4.862e+05   1.647    0.109    
trend       5.153e+03  5.842e+03   0.882    0.384    
lag1        9.184e-01  6.062e-02  15.150   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 58370 on 34 degrees of freedom
  (1 observation deleted due to missingness)
Multiple R-squared:  0.9969,    Adjusted R-squared:  0.9967 
F-statistic:  5423 on 2 and 34 DF,  p-value: < 2.2e-16
p3 <- p |>
    add_lines(x = ts1$index, y = ts1$fit3, mode = "lines", line = list(color = "black", dash = "dash"), name = "Fitted")
p3

Let’s check the model residuals:

plot_residuals(
    data = ts1,
    index_col = "index",
    actual_col = "y",
    fitted_col = "fit3",
    lag_max = 60,
    alpha = 0.05
)

Using lags in our model requires us to use a recursive function to create a forecast if the lag index is smaller than forecast horizon. In the above example, we train a model with lag 1, this mean we can use the last observation in the actual series as the first lag in the future data frame:

future_data$lag1 <- c(ts1$y[nrow(ts1)], rep(NA, 4))
future_data
# A tsibble: 5 x 7 [1Y]
  index trend      yhat     lower     upper trend2     lag1
  <dbl> <int>     <dbl>     <dbl>     <dbl>  <int>    <int>
1  2025    39 11789931. 11432704. 12147157.     17 11428636
2  2026    40 11883944. 11525312. 12242577.     18       NA
3  2027    41 11977958. 11617854. 12338061.     19       NA
4  2028    42 12071971. 11710334. 12433609.     20       NA
5  2029    43 12165985. 11802750. 12529220.     21       NA

Let’s now create a place holder for the forecast fields:

future_data$yhat3 <- NA
future_data$lower3 <- NA
future_data$upper3 <- NA

Now we can create the forecast for the first row and update the lag1 field of the second row with the prediction of the first row, and then repeat this process recursively:

fc3_temp <- predict(
    object = md3,
    newdata = future_data[1, ],
    interval = "prediction",
    level = 0.95
)

future_data$yhat3[1] <- fc3_temp[, 1]
future_data$lower3[1] <- fc3_temp[, 2]
future_data$upper3[1] <- fc3_temp[, 3]
future_data$lag1[2] <- future_data$yhat3[1]
future_data
# A tsibble: 5 x 10 [1Y]
  index trend      yhat     lower   upper trend2    lag1   yhat3  lower3  upper3
  <dbl> <int>     <dbl>     <dbl>   <dbl>  <int>   <dbl>   <dbl>   <dbl>   <dbl>
1  2025    39 11789931. 11432704.  1.21e7     17  1.14e7  1.15e7  1.14e7  1.16e7
2  2026    40 11883944. 11525312.  1.22e7     18  1.15e7 NA      NA      NA     
3  2027    41 11977958. 11617854.  1.23e7     19 NA      NA      NA      NA     
4  2028    42 12071971. 11710334.  1.24e7     20 NA      NA      NA      NA     
5  2029    43 12165985. 11802750.  1.25e7     21 NA      NA      NA      NA     

Let’s generalize it with a for loop:

min_lag <- 1

for (i in (min_lag + 1):h) {
    fc3_temp <- predict(
        object = md3,
        newdata = future_data[i, ],
        interval = "prediction",
        level = 0.95
    )
    future_data$yhat3[i] <- fc3_temp[, 1]
    future_data$lower3[i] <- fc3_temp[, 2]
    future_data$upper3[i] <- fc3_temp[, 3]
    if (i < h) {
        future_data$lag1[i + 1] <- future_data$yhat3[i]
    }
}

future_data
# A tsibble: 5 x 10 [1Y]
  index trend      yhat     lower     upper trend2     lag1  yhat3 lower3 upper3
  <dbl> <int>     <dbl>     <dbl>     <dbl>  <int>    <dbl>  <dbl>  <dbl>  <dbl>
1  2025    39 11789931. 11432704. 12147157.     17   1.14e7 1.15e7 1.14e7 1.16e7
2  2026    40 11883944. 11525312. 12242577.     18   1.15e7 1.16e7 1.14e7 1.17e7
3  2027    41 11977958. 11617854. 12338061.     19   1.16e7 1.16e7 1.15e7 1.18e7
4  2028    42 12071971. 11710334. 12433609.     20   1.16e7 1.17e7 1.16e7 1.18e7
5  2029    43 12165985. 11802750. 12529220.     21   1.17e7 1.18e7 1.16e7 1.19e7

Likewise, we can use the lm_forecast to create the forecast:

fc3 <- lm_forecast(
    model = md3,
    actual = ts1,
    future_data = future_data,
    actual_col = "y",
    lag_col = "lag1",
    level = 0.95
)

Let’s plot the forecast

plot_lm_forecast(fc3, actual_col = "y")