The ability to accurately predict sales is of significant importance for every retailer. It allows to make good investment decisions and adjust one’s actions to the ever-changing market conditions. A data-driven company needs predicting tools that are accurate. But it is just as important that they are scalable, that is: capable of processing the growing amounts of data at a reasonable pace. Rapid advances in computing technology made collecting and storing the data pretty easy, yet the models designed some time ago struggle to extract valuable insights from these large piles of data. Inspired by the most recent scientific developments in the field, we conducted our own research, which led us to design an accurate and scalable forecasting framework. Let’s take a look at how we did it!

How does one predict demand? First, it is important to note that sales data come in the form of a panel, that is: they consist of two dimensions. One of them is time. The other is categories: data can be collected on different products or groups of products, different store locations or different regions or countries. The demand pattern for one category is often informative about the one for the others. For this reason, they ought to be analysed jointly. A typical statistical model aims at describing sales levels with four groups of factors:

- Past sales, providing information about the general trend, seasonal (weekly, monthly) patterns and holiday effects;
- External factors, including weather, market movements, competitors’ actions or the state of the economy, for instance;
- Category-specific factor, accounting for the fact that some categories may be different than others: a store employing a superb (or terrible) salesman, located in an (un)favourable spot, visited by a particular type of clientele;
- Unexpected events, covering extreme weather conditions forcing potential clients to stay home, a car accident blocking the only access to the store, a meteorite crushing the warehouse and destroying the stock or just the pure coincidence of unexceptionally many people deciding to go shopping at the same time.

The model tries to estimate the impact of the first three factors in such a way that the last one turns out to be negligible.

So far so good. Unfortunately, at this point the statistical theory comes one’s way. Provided one would like to include both the past sales and the category-specific factor in the model, which is of importance here, the standard results are no longer reliable (scroll down for some math to see why!). The typical way out of this predicament is not to forecast sales levels themselves, but rather the changes in sales, that is the increase or decrease compared to the previous period. This is fine statistically, but in fact removes the category-specific factor (which is assumed to be constant over time) from the model. The methods based on this approach have been accepted by the statisticians as the best way to estimate the impact of external factors.

But the category-specific factor might add to prediction’s accuracy! Wouldn’t it be great to find a way to include it in some theoretically sound way? This is what the so-called Bayesian statistics has achieved – it provides a way to obtain the entire distribution of the category-specific impact. However, this Bayesian approach is based on time-consuming simulations and can be extremely slow. This drawback might be disqualifying if loads of data are received at the end of the day and the predictions and subsequent analyses are expected by the following morning.

This is where our prediction framework comes to save the day. It encompasses two separate steps. In the first step, the impacts of past sales and external factors are estimated following one of the typically applied methods, disregarding the category-specific factor for a while. In the second step, we develop a statistical procedure based on the so-called Tweedie’s Formula to re-obtain the category-specific impact and use it to adjust the predicted values.

The figure above benchmarks the performance of our forecasting framework against the most popular dynamic panel set-up for a real sales data set. The bars show the prediction error (smaller error means better model!) when forecasting from one up to five periods ahead. For each horizon, three models are compared. The benchmark is the Blundell-Bond model, known also as Systems GMM and labelled “SYSGMM” in the plot. The other two models, QMLEpar and SSYS3par, are different implementations of the Veneficus forecasting framework.

It is not surprising that the prediction error increases with the time horizon: forecast accuracy tends to deteriorate with time for all models, as the distant future is more uncertain than the near one. It is noteworthy, however, that the extent of this deterioration is significantly smaller for out framework. This means that the gains from using the Veneficus framework instead of the current state-of-the-art method increases with the forecast horizon! This conclusion, found empirically in the figure above, is also validated by a set of simulations studies. Thus, we are proud to say that the Veneficus forecasting framework allows our clients to get more out of their data!

Let us finish off with something for the mathematically inclined readers: a look at some equations! Let us follow the convention of denoting our dependent variable, that is sales, by $y$. Because it takes different values for different categories and different points in time, we index it with $i$ and $t$ respectively: $y_{i,t}$. The first shot at a forecasting model would look somewhat like this:

$$y_{i,t} = \alpha_{0,i} + \alpha_{1} y_{i,t-1} + \alpha_{2} x_{i,t-1} + \varepsilon_{i,t}$$

where $\alpha_{0,i}$ denotes the category-specific factor ($i$ takes a different value for each category), $\alpha_{1}$ is the impact of past sales $y_{i,t-1}$ and $\alpha_{2}$ is the impact of external factors $x_{i,t-1}$. The last term in the equation, $\varepsilon_{i,t}$, denotes the impact of unexpected events.

A typical panel data model requires an assumption about $\alpha_{0,i}$: either that it is correlated with the external factors $x$ or not. In this case one would go for the former, as it is the only way to obtain category-specific effects. This results in the fixed effects model. To estimate its parameters, one performs the so-called within-transformation, that boils down to subtracting the category mean from each variable:

$$y_{i,t} - \bar{y}_{i} = (\alpha_{0,i} - \bar{\alpha}_{0,i}) + \alpha_{1} (y_{i,t-1} – \bar{y}_{i}) + \alpha_{2} (x_{i,t-1} – \bar{x}_{i}) + (\varepsilon_{i,t} - \bar{\varepsilon}_{i})$$

This method works in most cases and would also work here if not for the past sales on the right-hand side of the equation. Unfortunately, the term $(y_{i,t-1} – \bar{y}_{i})$ is correlated with the error, or unexpected events impact, $(\varepsilon_{i,t} - \bar{\varepsilon}_{i})$. This goes against the statistical rules stating that no right-hand side variable is allowed to be correlated with the error term. Breaking this rule produces biased results. An easy solution would be to remove the past sales from the model, but this means throwing away a lot of useful information! Instead, one may consider a different transformation in place of the within one, for example differentiating, that is: subtracting from each variable its own value from the previous time period:

$$y_{i,t} - y_{i,t-1} =$$ $$(\alpha_{0,i} - \alpha_{0,i}) + \alpha_{1} (y_{i,t-1} – y_{i,t-2}) + \alpha_{2} (x_{i,t-1} – x_{i,t-2}) + (\varepsilon_{i,t} - \varepsilon_{i,t-1}).$$

This approach allows one to employ a technique called “method of moments estimation” to account for the inconvenient correlation and produce reliable results. However, a watchful eye will notice that $\alpha_{0,i} - \alpha_{0,i}$ yields zero, which means that we have simply removed the category-specific effect!

In our forecasting framework, we use one of the method of moments techniques (there exist multiple variations!) to reliably estimate $\alpha_{1}$, the impact of past sales, and $\alpha_{2}$, the impact of external factors, disregarding $\alpha_{0,i}$ for a while. Having obtained those two, we throw them into our formulas and algorithms that finally produce the $\alpha_{0,i}$ - the lost category-specific impact. All the alphas then serve the purpose of providing our clients with state-of-the art sales predictions.