Walmart Black Friday Purchase Behaviour

Walmart Black Friday
Purchase Behaviour

Statistical analysis of Black Friday transaction data to understand how gender, age, city, and marital status influence customer spending — and what Walmart should do about it.

TypeEDA & Statistical Analysis

DomainRetail / E-commerce

Dataset537,577 transactions

ToolsPython · Pandas · Scipy · Seaborn

CourseScaler Academic Case Study

Who spends the most on Black Friday?

Walmart wants to understand Black Friday purchase behaviour across demographic segments. The business question: do men and women spend differently? Does city type matter? Does age? The answers directly shape targeted promotions, inventory allocation, and marketing spend for the next Black Friday campaign.

🛒

The hypothesis

Walmart's management believes male customers spend more than female customers. The task is to test this statistically — not just describe it — and build confidence intervals to support business decisions.

What the numbers say

Segment	Avg Purchase (₹)	Key Insight
Male	9,504	Higher spender
Female	8,734	Lower spender
Age 51–55	~9,900	Highest age group
Age 0–17	~8,100	Lowest age group
City C	~9,700	Highest city
City A	~9,100	Second highest
Single	~9,280	Slightly higher
Married	~9,250	Very close

⚠️

The outlier issue

Purchases beyond ₹20,000 are statistical outliers (IQR method). These rare but high-value transactions inflate the mean. Median is a more reliable central tendency measure here.

Statistical approach

Outlier Detection via IQR

Used boxplot + IQR to identify purchases beyond ₹20,000. Flagged as outliers but retained — they represent real transactions.

IQR · boxplot · quantile()

Central Limit Theorem Application

With 537K records, CLT applies: sampling distributions are approximately normal. Used this to build confidence intervals for population mean purchase amount.

scipy.stats · CLT

Confidence Interval Analysis

Built 90%, 95%, and 99% CIs for male vs female purchase means. Used overlapping intervals to determine if the difference is statistically meaningful.

stats.t.interval() · sem()

Multi-Dimensional Segmentation

Grouped by gender × age, gender × city, and marital status × gender. Compared average purchases across all combinations using grouped bar charts.

groupby · barplot(hue)

Python — confidence_intervals.py

from scipy import stats
import numpy as np

male_purchases   = df[df['Gender']=='M']['Purchase']
female_purchases = df[df['Gender']=='F']['Purchase']

# 95% Confidence Interval for male purchases
ci_male = stats.t.interval(
    0.95,
    df=len(male_purchases)-1,
    loc=male_purchases.mean(),
    scale=stats.sem(male_purchases)
)

# Output: (9487.2, 9520.8)
# vs Female CI: (8719.4, 8748.6) — no overlap → significant

Business implications

👨

Males spend ~₹770 more on average

The confidence intervals for male and female purchases do not overlap — the difference is statistically significant, not just chance.

🏙️

City C has the highest spenders

Customers from City C (likely Tier-2 cities) spend more per transaction than City A or B despite lower income assumptions.

👴

51–55 age group spends most

Mid-career, higher-income customers. This segment deserves premium product placement and targeted promotions.

💍

Marital status barely matters

Singles spend slightly more but the difference is marginal. Not a useful segmentation axis for Walmart's marketing.

Walmart Black Friday
Purchase Behaviour

Who spends the most on Black Friday?

What the numbers say

Statistical approach

Business implications

Walmart Black FridayPurchase Behaviour

Who spends the most on Black Friday?

What the numbers say

Statistical approach

Business implications

Walmart Black Friday
Purchase Behaviour