Web scraping 1,000 customer reviews, sentiment analysis with TextBlob, and a Random Forest model to predict which customers will complete a booking — all in the British Airways Forage simulation.
The British Airways Forage simulation covers two real data science tasks that reflect what BA's data teams actually do: understanding customer sentiment from unstructured review data and predicting which customers will follow through with a booking.
British Airways customer reviews from Skytrax were scraped using Python's requests and BeautifulSoup libraries. 10 pages × 100 reviews per page = 1,000 customer opinions on everything from cabin crew to food to delays.
import requests from bs4 import BeautifulSoup from textblob import TextBlob base_url = "https://www.airlinequality.com/airline-reviews/british-airways" reviews = [] # Scrape 10 pages × 100 reviews for i in range(1, 11): url = f"{base_url}/page/{i}/?pagesize=100" soup = BeautifulSoup(requests.get(url).text, "html.parser") for review in soup.find_all("div", itemprop="reviewBody"): reviews.append(review.get_text()) # Sentiment scoring df["sentiment"] = df["reviews"].apply(lambda x: TextBlob(x).sentiment.polarity) # Range: -1 (negative) to +1 (positive) # BA reviews: skewed negative (delays, service complaints)
Using the BA customer booking dataset, the goal was to build a model that predicts booking_complete (1 or 0) from features like purchase lead time, trip type, flight hour, and add-on preferences.
from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report, accuracy_score # Encode categoricals df_enc = pd.get_dummies(df, columns=['sales_channel','trip_type','flight_day'], drop_first=True) df_enc = df_enc.drop(columns=['route','booking_origin']) X = df_enc.drop('booking_complete', axis=1) y = df_enc['booking_complete'] rf = RandomForestClassifier(random_state=42) rf.fit(X_train, y_train) print("Accuracy:", accuracy_score(y_test, rf.predict(X_test))) # 85.1% print(classification_report(y_test, rf.predict(X_test)))