Home About Expertise Projects Blogs Contact
EDAStatistical AnalysisCompleted

Walmart Black Friday
Purchase Behaviour

Statistical analysis of Black Friday transaction data to understand how gender, age, city, and marital status influence customer spending — and what Walmart should do about it.

TypeEDA & Statistical Analysis
DomainRetail / E-commerce
Dataset537,577 transactions
ToolsPython · Pandas · Scipy · Seaborn
CourseScaler Academic Case Study
537K
Transactions
₹9,264
Avg Purchase Amount
51–55
Highest Spending Age Group
City C
Highest Avg Purchase
01 — Business Problem

Who spends the most on Black Friday?

Walmart wants to understand Black Friday purchase behaviour across demographic segments. The business question: do men and women spend differently? Does city type matter? Does age? The answers directly shape targeted promotions, inventory allocation, and marketing spend for the next Black Friday campaign.

🛒
The hypothesis
Walmart's management believes male customers spend more than female customers. The task is to test this statistically — not just describe it — and build confidence intervals to support business decisions.
02 — Key Statistical Findings

What the numbers say

SegmentAvg Purchase (₹)Key Insight
Male9,504Higher spender
Female8,734Lower spender
Age 51–55~9,900Highest age group
Age 0–17~8,100Lowest age group
City C~9,700Highest city
City A~9,100Second highest
Single~9,280Slightly higher
Married~9,250Very close
⚠️
The outlier issue
Purchases beyond ₹20,000 are statistical outliers (IQR method). These rare but high-value transactions inflate the mean. Median is a more reliable central tendency measure here.
03 — Methodology

Statistical approach

01
Outlier Detection via IQR
Used boxplot + IQR to identify purchases beyond ₹20,000. Flagged as outliers but retained — they represent real transactions.
IQR · boxplot · quantile()
02
Central Limit Theorem Application
With 537K records, CLT applies: sampling distributions are approximately normal. Used this to build confidence intervals for population mean purchase amount.
scipy.stats · CLT
03
Confidence Interval Analysis
Built 90%, 95%, and 99% CIs for male vs female purchase means. Used overlapping intervals to determine if the difference is statistically meaningful.
stats.t.interval() · sem()
04
Multi-Dimensional Segmentation
Grouped by gender × age, gender × city, and marital status × gender. Compared average purchases across all combinations using grouped bar charts.
groupby · barplot(hue)
Python — confidence_intervals.py
from scipy import stats
import numpy as np

male_purchases   = df[df['Gender']=='M']['Purchase']
female_purchases = df[df['Gender']=='F']['Purchase']

# 95% Confidence Interval for male purchases
ci_male = stats.t.interval(
    0.95,
    df=len(male_purchases)-1,
    loc=male_purchases.mean(),
    scale=stats.sem(male_purchases)
)

# Output: (9487.2, 9520.8)
# vs Female CI: (8719.4, 8748.6) — no overlap → significant
04 — Key Findings

Business implications

👨
Males spend ~₹770 more on average
The confidence intervals for male and female purchases do not overlap — the difference is statistically significant, not just chance.
🏙️
City C has the highest spenders
Customers from City C (likely Tier-2 cities) spend more per transaction than City A or B despite lower income assumptions.
👴
51–55 age group spends most
Mid-career, higher-income customers. This segment deserves premium product placement and targeted promotions.
💍
Marital status barely matters
Singles spend slightly more but the difference is marginal. Not a useful segmentation axis for Walmart's marketing.
05 — Tech Stack
Python 3PandasNumPyScipy StatsSeabornMatplotlibConfidence Intervals
← Back to Projects View on GitHub ↗