Logistic regression model to predict loan default risk — helping LoanTap's credit team decide who to approve, who to reject, and at what threshold.
LoanTap is an online platform that evaluates personal loan applications. The business challenge: build a model that identifies loan defaulters before the loan is issued. A missed defaulter costs money. An over-cautious model rejects creditworthy customers. The right balance is a business decision, not just a technical one.
| Column | Action | Reason |
|---|---|---|
| emp_title | Dropped | High cardinality — too many unique values |
| title | Dropped | High cardinality — free text, not useful |
| address | Dropped | Not a predictive feature for credit risk |
| emp_length | Imputed | 4.6% missing — filled with mode |
| mort_acc | Imputed | 9.5% missing — filled with median by grade |
| pub_rec_bankruptcies | Imputed | Small % missing — filled with 0 |
from sklearn.linear_model import LogisticRegression from sklearn.metrics import roc_auc_score, roc_curve, classification_report model = LogisticRegression(max_iter=1000) model.fit(X_train_scaled, y_train) y_prob = model.predict_proba(X_test_scaled)[:,1] y_pred = model.predict(X_test_scaled) roc_auc = roc_auc_score(y_test, y_prob) print("ROC-AUC:", roc_auc) # Problem: model predicts almost all as class 0 (Fully Paid) # Defaulter recall ≈ 0 despite threshold tuning # Root cause: 80:20 class imbalance # Fix: class_weight='balanced' or SMOTE oversampling