Time series forecasting of Wikipedia page view traffic using ARIMA, SARIMAX, and Facebook Prophet — comparing models to find the most accurate predictor for ad spend optimisation.
Ad Ease is a digital advertising platform. Wikipedia is one of its key inventory sources. Ad prices on Wikipedia fluctuate with traffic — high traffic = expensive ad slots. If Ad Ease can forecast Wikipedia traffic accurately, it can time ad purchases to minimise cost and maximise reach.
| Component | What was found | Implication |
|---|---|---|
| Trend | Upward growth over time, sharp jump mid-2016 | Event/campaign effect — not just organic growth |
| Seasonality | Strong weekly pattern, consistent amplitude | Additive decomposition model is appropriate |
| Residuals | Mostly random, spikes at major events | Unexplained variance is event-driven |
| Stationarity (ADF) | p=0.1895 — NOT stationary | First-order differencing (d=1) needed before ARIMA |
from statsmodels.tsa.stattools import adfuller from statsmodels.tsa.arima.model import ARIMA # ADF Test — original series adf_result = adfuller(en_ts) print("p-value:", adf_result[1]) # 0.1895 → NOT stationary # First differencing achieves stationarity en_diff = en_ts.diff().dropna() adf2 = adfuller(en_diff) print("p-value after diff:", adf2[1]) # ≈0 → stationary ✓ # ARIMA model (d=1 confirmed) model = ARIMA(train, order=(5,1,2)) result = model.fit() forecast = result.forecast(steps=len(test)) # MAPE: 6.80% ✅
| Model | MAPE (English) | Result | Notes |
|---|---|---|---|
| ARIMA | 6.80% | ✅ Within target | Strong baseline — handles trend + seasonality |
| SARIMAX | 10.47% | ❌ Higher error | Campaign effect didn't align with this window |
| Prophet | 6.65% | ✅ Best performer | Handles event spikes best — automatic seasonality |