Feature engineering and exploratory analysis on Delhivery's shipment data to find where actual delivery times diverge from routing predictions — and why.
Delhivery uses OSRM (Open Source Routing Machine) to predict delivery times and distances. But OSRM operates on road network data — it doesn't know about traffic jams, loading delays, or idle time between segments. The goal: quantify the gap between predicted and actual delivery performance, and identify where the biggest inefficiencies are.
# IQR-based outlier capping def cap_outliers(col): Q1 = df[col].quantile(0.25) Q3 = df[col].quantile(0.75) IQR = Q3 - Q1 df[col] = np.where(df[col] < Q1 - 1.5*IQR, Q1 - 1.5*IQR, np.where(df[col] > Q3 + 1.5*IQR, Q3 + 1.5*IQR, df[col])) # Engineered gap feature df['time_gap'] = df['actual_time'] - df['osrm_time'] df['dist_gap'] = df['actual_distance_to_destination'] - df['osrm_distance'] # Busiest corridors top_routes = df.groupby(['source_center', 'destination_center']).size()\ .sort_values(ascending=False).head(5)