All Case Studies
Real EstateForecastingClassification9 min read

Rio Grande Builders

Lead Scoring & Neighborhood Demand Forecasting for a South Texas Home Builder

PythonXGBoostProphetscikit-learnPandasRecharts

-62%

Unsold Inventory

reduction

+34%

Lead-to-Close Rate

improvement

-28%

Marketing CPA

cost savings

01The Challenge

Rio Grande Builders is a mid-size residential construction company operating across Hidalgo, Cameron, and Starr counties. They build 40-60 homes per year — primarily single-family starter homes and mid-range custom builds. Their sales pipeline was entirely relationship-driven: the owner and two sales reps relied on word of mouth, drive-by lot scouting, and gut instinct to decide where to build next.

The problems were concrete: they broke ground on a 12-home subdivision in a neighborhood where demand had already peaked, leaving 4 units unsold for over 9 months. Meanwhile, a competitor moved into an adjacent ZIP code that showed clear growth signals they had missed. They had no systematic way to identify which neighborhoods were heating up, which leads were most likely to convert, or how to allocate their limited marketing budget across a three-county footprint.

Data Landscape

The data landscape: Sales tracked in a shared Excel file, leads arriving through a generic Gmail inbox, and county permit data that was never analyzed. Three years of CRM-equivalent history, public permit filings from three counties, MLS listing data, Census/ACS demographics, and Google Trends search volume — all available but never connected.

02Our Approach

We framed this as three interconnected modeling problems: neighborhood demand scoring to identify where to build, lead scoring to identify who to sell to, and budget optimization to allocate marketing spend efficiently. Each model feeds the next, and all surface through a unified dashboard and CRM integration.

  • XGBoost Classifier gradient-boosted lead scoring model trained on 3 years of CRM data to rank prospects 0-100
  • Prophet + GeoPandas time-series permit forecasting with spatial smoothing to capture neighborhood spillover effects
  • Mapbox GL JS interactive census tract map colored by demand score for the sales team's weekly planning
  • scipy Optimization constrained marketing budget allocation across ZIP codes weighted by demand score and channel ROI
  • HubSpot API automated lead score injection into CRM — replaced the shared Excel file with structured pipeline tracking

County Permit Data

3 counties, scraped weekly

Feature Engineering

Census + MLS + Trends

Demand Scoring

Per census tract, 6-12mo

Lead Model

XGBoost, 0-100 score

Dashboard + CRM

React + HubSpot

03Key Findings

Neighborhood Demand Ranking

Census tracts ranked by 12-month demand score (0-100). Scores combine permit velocity, MLS absorption rate, population growth, and spatial spillover from adjacent tracts. The top 6 tracts account for over 60% of near-term opportunity.

Lead Score Distribution: Before vs. After

Before ML scoring, leads were treated nearly equally — a flat distribution with no clear separation. After deployment, the model creates a bimodal split: low-probability leads cluster below 20, while high-value prospects concentrate above 80, letting the sales team focus their time.

Permit Volume Forecast

36 months of historical county permit filings with a 12-month Prophet forecast and 80% confidence band. The model captures the seasonal spring-summer construction surge and projects continued growth into 2025.

04Business Impact

Unsold Inventory
5.2 units avg2.0 units avg
-62%
Lead-to-Close Rate
8.1%10.9%
+34%
Marketing Cost/Acquisition
BaselineOptimized
-28%

Projected Annual Value

62% reduction in unsold inventory within 6 months

The demand scoring map became the centerpiece of the owner's weekly planning meetings. Instead of debating which neighborhoods "felt hot," the team now reviews tract-level scores updated every Monday morning. The first decision it influenced: they pivoted a planned 8-unit subdivision from a cooling tract to one ranked in the top 5 — all 8 units were under contract within 4 months.

Lead scoring changed how the sales reps spend their mornings. With scores auto-populated in HubSpot, they sort by priority and work the top 20 first. The +34% lift in lead-to-close rate came not from getting better leads, but from spending more time on the right ones.

05Technical Details

Lead Scoring Model (XGBoost)

  • Features: lead_source, time_to_first_contact, tract_demand_score, median_income, referral_flag, season
  • Target: binary (converted vs. not), outputs calibrated probability scaled 0-100
  • Evaluation: AUC = 0.81, precision@top-20% = 0.67 (5-fold CV)

Demand Forecast (Prophet + GeoPandas)

  • Granularity: monthly permit volume per census tract
  • Spatial smoothing: inverse-distance weighting from adjacent tracts
  • Accuracy: MAPE = 11% on 12-month holdout across 24 tracts

Marketing Optimization (scipy)

  • Method: constrained linear programming via scipy.optimize.linprog
  • Constraints: total monthly budget cap, minimum spend per active ZIP
  • Objective: maximize expected conversions weighted by tract demand score

Facing similar challenges?

Let's discuss how data science can drive results for your business.