Home About Expertise Projects Blogs Contact
Time SeriesARIMACompleted

Ad Ease
Wikipedia Traffic Forecasting

Time series forecasting of Wikipedia page view traffic using ARIMA, SARIMAX, and Facebook Prophet — comparing models to find the most accurate predictor for ad spend optimisation.

TypeTime Series · ARIMA · Prophet
DomainAdTech / Digital Marketing
DatasetWikipedia page views · Multiple languages
ToolsPython · Statsmodels · Prophet · ARIMA
CourseScaler Academic Case Study
6.65%
Best MAPE (Prophet)
6.80%
ARIMA MAPE (English)
8.75%
ARIMA MAPE (Japanese)
3
Models Compared
01 — Business Problem

When should Ad Ease buy ad space?

Ad Ease is a digital advertising platform. Wikipedia is one of its key inventory sources. Ad prices on Wikipedia fluctuate with traffic — high traffic = expensive ad slots. If Ad Ease can forecast Wikipedia traffic accurately, it can time ad purchases to minimise cost and maximise reach.

📈
The forecasting challenge
Wikipedia page views show strong weekly seasonality, long-term trend, and event-driven spikes. A good forecast model must handle all three — not just the smooth trend.
02 — Time Series Decomposition

Understanding the signal structure

ComponentWhat was foundImplication
TrendUpward growth over time, sharp jump mid-2016Event/campaign effect — not just organic growth
SeasonalityStrong weekly pattern, consistent amplitudeAdditive decomposition model is appropriate
ResidualsMostly random, spikes at major eventsUnexplained variance is event-driven
Stationarity (ADF)p=0.1895 — NOT stationaryFirst-order differencing (d=1) needed before ARIMA
Python — stationarity_and_arima.py
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.arima.model import ARIMA

# ADF Test — original series
adf_result = adfuller(en_ts)
print("p-value:", adf_result[1])  # 0.1895 → NOT stationary

# First differencing achieves stationarity
en_diff = en_ts.diff().dropna()
adf2 = adfuller(en_diff)
print("p-value after diff:", adf2[1])  # ≈0 → stationary ✓

# ARIMA model (d=1 confirmed)
model = ARIMA(train, order=(5,1,2))
result = model.fit()
forecast = result.forecast(steps=len(test))
# MAPE: 6.80% ✅
03 — Model Comparison

ARIMA vs SARIMAX vs Prophet

ModelMAPE (English)ResultNotes
ARIMA6.80%✅ Within targetStrong baseline — handles trend + seasonality
SARIMAX10.47%❌ Higher errorCampaign effect didn't align with this window
Prophet6.65%✅ Best performerHandles event spikes best — automatic seasonality
🏆
Prophet wins for English, ARIMA for Japanese
Prophet handles the event-driven spikes in English Wikipedia traffic better. Japanese traffic is smoother with fewer spikes — ARIMA's simpler structure is sufficient.
04 — Key Findings

Business implications for Ad Ease

📅
Weekly seasonality is predictable
Strong, consistent weekly cycles mean Ad Ease can plan ad buys around predictable low-traffic (low-cost) windows — typically midweek vs weekend.
🌍
English traffic is event-driven
Spikes on English pages are event-triggered (news, campaigns). Ad Ease should monitor news calendars to anticipate and exploit sudden traffic surges.
🔮
Prophet is the production model
6.65% MAPE on English is excellent for ad spend forecasting. At this accuracy, Ad Ease can confidently model cost-per-impression 2–4 weeks ahead.
🗾
Japanese requires separate model
Japanese Wikipedia traffic has different patterns. A separate ARIMA model with language-specific parameters outperforms a one-size-fits-all approach.
05 — Tech Stack
Python 3PandasStatsmodelsARIMASARIMAXFacebook ProphetADF TestMAPE
← Back to Projects View on GitHub ↗