Home About Expertise Projects Blogs Contact
Linear RegressionPredictive ModellingCompleted

Jamboree Education
Admission Chance Predictor

Building a linear regression model to predict a student's probability of admission to top US graduate schools — using GRE, TOEFL, CGPA, and research experience.

TypeLinear Regression
DomainEducation / Ed-Tech
Dataset500 students · 9 features
ToolsPython · Scikit-learn · Statsmodels
CourseScaler Academic Case Study
500
Student Records
0.88
CGPA Correlation
Model Evaluation Metric
0
Missing Values
01 — Business Problem

What's my chance of getting in?

Jamboree Education helps students prepare for GRE and GMAT exams. They want to give students a data-driven estimate of their graduate school admission probability — based on their academic profile. A reliable predictor helps students set realistic targets and invest study time where it matters most.

🎓
Why linear regression
The target variable (Chance of Admit) is continuous and ranges from 0 to 1. Linear regression is the natural baseline model — interpretable, fast, and well-suited for this feature set.
02 — Feature Correlations

What matters most

FeatureCorrelation with AdmissionStrength
CGPA
0.88
Very Strong
GRE Score
0.81
Strong
TOEFL Score
0.79
Strong
University Rating
0.69
Moderate
SOP
0.68
Moderate
LOR
0.65
Moderate
Research
0.55
Moderate-Weak
⚠️
Multicollinearity alert
GRE and TOEFL are highly correlated (~0.83), as are GRE and CGPA (~0.83). This multicollinearity must be addressed before finalising the regression model — checked using VIF scores.
03 — Methodology

Model pipeline

01
Data Cleaning
Dropped Serial No. column (identifier, not a feature). Confirmed zero duplicates and zero missing values. Dataset is clean out of the box.
drop() · duplicated().sum()
02
EDA & Outlier Check
Plotted distributions for all features. Used boxplots to check outliers. Concluded outliers are minimal and no treatment required.
boxplot · histplot
03
VIF Analysis for Multicollinearity
Calculated Variance Inflation Factor for all features. Identified GRE, TOEFL, and CGPA as multicollinear. Dropped or monitored accordingly.
statsmodels VIF · variance_inflation_factor()
04
Train-Test Split + Scaling
80/20 split. Applied StandardScaler to training set and transformed test set. Prevents data leakage from scaling.
train_test_split · StandardScaler
05
Linear Regression + Evaluation
Trained LinearRegression. Evaluated with R², RMSE, and MAE. Also checked OLS summary via statsmodels for statistical significance of coefficients.
sklearn LinearRegression · statsmodels OLS
Python — regression_model.py
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = df.drop('Chance of Admit ', axis=1)
y = df['Chance of Admit ']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s  = scaler.transform(X_test)

model = LinearRegression()
model.fit(X_train_s, y_train)

y_pred = model.predict(X_test_s)
print("R²:", r2_score(y_test, y_pred))
# R² ≈ 0.82 — model explains 82% of admission variance
04 — Key Findings

What the model tells us

📚
CGPA is the single most important factor
Correlation of 0.88 with admission chance. Students should prioritise undergrad GPA above almost everything else.
📝
GRE and TOEFL are important but redundant
Both are strongly correlated with admissions and with each other. Improving one tends to mean the other improves too.
🔬
Research experience gives a meaningful boost
Binary feature (0/1) with 0.55 correlation. Having research on your application meaningfully increases admission probability.
🏫
University rating matters but moderately
Strong university reputation helps, but it's a weaker predictor than personal academic performance metrics.
05 — Tech Stack
Python 3PandasScikit-learnStatsmodelsLinearRegressionVIF AnalysisStandardScaler
← Back to Projects View on GitHub ↗