Jamboree Education Admission Predictor

Jamboree Education
Admission Chance Predictor

Building a linear regression model to predict a student's probability of admission to top US graduate schools — using GRE, TOEFL, CGPA, and research experience.

TypeLinear Regression

DomainEducation / Ed-Tech

Dataset500 students · 9 features

ToolsPython · Scikit-learn · Statsmodels

CourseScaler Academic Case Study

What's my chance of getting in?

Jamboree Education helps students prepare for GRE and GMAT exams. They want to give students a data-driven estimate of their graduate school admission probability — based on their academic profile. A reliable predictor helps students set realistic targets and invest study time where it matters most.

🎓

Why linear regression

The target variable (Chance of Admit) is continuous and ranges from 0 to 1. Linear regression is the natural baseline model — interpretable, fast, and well-suited for this feature set.

What matters most

Feature	Correlation with Admission	Strength
CGPA	0.88	Very Strong
GRE Score	0.81	Strong
TOEFL Score	0.79	Strong
University Rating	0.69	Moderate
SOP	0.68	Moderate
LOR	0.65	Moderate
Research	0.55	Moderate-Weak

⚠️

Multicollinearity alert

GRE and TOEFL are highly correlated (~0.83), as are GRE and CGPA (~0.83). This multicollinearity must be addressed before finalising the regression model — checked using VIF scores.

Model pipeline

Data Cleaning

Dropped Serial No. column (identifier, not a feature). Confirmed zero duplicates and zero missing values. Dataset is clean out of the box.

drop() · duplicated().sum()

EDA & Outlier Check

Plotted distributions for all features. Used boxplots to check outliers. Concluded outliers are minimal and no treatment required.

boxplot · histplot

VIF Analysis for Multicollinearity

Calculated Variance Inflation Factor for all features. Identified GRE, TOEFL, and CGPA as multicollinear. Dropped or monitored accordingly.

statsmodels VIF · variance_inflation_factor()

Train-Test Split + Scaling

80/20 split. Applied StandardScaler to training set and transformed test set. Prevents data leakage from scaling.

train_test_split · StandardScaler

Linear Regression + Evaluation

Trained LinearRegression. Evaluated with R², RMSE, and MAE. Also checked OLS summary via statsmodels for statistical significance of coefficients.

sklearn LinearRegression · statsmodels OLS

Python — regression_model.py

from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score, mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X = df.drop('Chance of Admit ', axis=1)
y = df['Chance of Admit ']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_s = scaler.fit_transform(X_train)
X_test_s  = scaler.transform(X_test)

model = LinearRegression()
model.fit(X_train_s, y_train)

y_pred = model.predict(X_test_s)
print("R²:", r2_score(y_test, y_pred))
# R² ≈ 0.82 — model explains 82% of admission variance

What the model tells us

📚

CGPA is the single most important factor

Correlation of 0.88 with admission chance. Students should prioritise undergrad GPA above almost everything else.

📝

GRE and TOEFL are important but redundant

Both are strongly correlated with admissions and with each other. Improving one tends to mean the other improves too.

🔬

Research experience gives a meaningful boost

Binary feature (0/1) with 0.55 correlation. Having research on your application meaningfully increases admission probability.

🏫

University rating matters but moderately

Strong university reputation helps, but it's a weaker predictor than personal academic performance metrics.

Jamboree Education
Admission Chance Predictor

What's my chance of getting in?

What matters most

Model pipeline

What the model tells us

Jamboree EducationAdmission Chance Predictor

What's my chance of getting in?

What matters most

Model pipeline

What the model tells us

Jamboree Education
Admission Chance Predictor