Rio Grande Builders
Lead Scoring & Neighborhood Demand Forecasting for a South Texas Home Builder
-62%
Unsold Inventory
reduction
+34%
Lead-to-Close Rate
improvement
-28%
Marketing CPA
cost savings
01The Challenge
Rio Grande Builders is a mid-size residential construction company operating across Hidalgo, Cameron, and Starr counties. They build 40-60 homes per year — primarily single-family starter homes and mid-range custom builds. Their sales pipeline was entirely relationship-driven: the owner and two sales reps relied on word of mouth, drive-by lot scouting, and gut instinct to decide where to build next.
The problems were concrete: they broke ground on a 12-home subdivision in a neighborhood where demand had already peaked, leaving 4 units unsold for over 9 months. Meanwhile, a competitor moved into an adjacent ZIP code that showed clear growth signals they had missed. They had no systematic way to identify which neighborhoods were heating up, which leads were most likely to convert, or how to allocate their limited marketing budget across a three-county footprint.
Data Landscape
02Our Approach
We framed this as three interconnected modeling problems: neighborhood demand scoring to identify where to build, lead scoring to identify who to sell to, and budget optimization to allocate marketing spend efficiently. Each model feeds the next, and all surface through a unified dashboard and CRM integration.
- XGBoost Classifier — gradient-boosted lead scoring model trained on 3 years of CRM data to rank prospects 0-100
- Prophet + GeoPandas — time-series permit forecasting with spatial smoothing to capture neighborhood spillover effects
- Mapbox GL JS — interactive census tract map colored by demand score for the sales team's weekly planning
- scipy Optimization — constrained marketing budget allocation across ZIP codes weighted by demand score and channel ROI
- HubSpot API — automated lead score injection into CRM — replaced the shared Excel file with structured pipeline tracking
County Permit Data
3 counties, scraped weekly
Feature Engineering
Census + MLS + Trends
Demand Scoring
Per census tract, 6-12mo
Lead Model
XGBoost, 0-100 score
Dashboard + CRM
React + HubSpot
03Key Findings
Neighborhood Demand Ranking
Census tracts ranked by 12-month demand score (0-100). Scores combine permit velocity, MLS absorption rate, population growth, and spatial spillover from adjacent tracts. The top 6 tracts account for over 60% of near-term opportunity.
Lead Score Distribution: Before vs. After
Before ML scoring, leads were treated nearly equally — a flat distribution with no clear separation. After deployment, the model creates a bimodal split: low-probability leads cluster below 20, while high-value prospects concentrate above 80, letting the sales team focus their time.
Permit Volume Forecast
36 months of historical county permit filings with a 12-month Prophet forecast and 80% confidence band. The model captures the seasonal spring-summer construction surge and projects continued growth into 2025.
04Business Impact
Projected Annual Value
62% reduction in unsold inventory within 6 months
The demand scoring map became the centerpiece of the owner's weekly planning meetings. Instead of debating which neighborhoods "felt hot," the team now reviews tract-level scores updated every Monday morning. The first decision it influenced: they pivoted a planned 8-unit subdivision from a cooling tract to one ranked in the top 5 — all 8 units were under contract within 4 months.
Lead scoring changed how the sales reps spend their mornings. With scores auto-populated in HubSpot, they sort by priority and work the top 20 first. The +34% lift in lead-to-close rate came not from getting better leads, but from spending more time on the right ones.
05Technical Details
Lead Scoring Model (XGBoost)
- Features: lead_source, time_to_first_contact, tract_demand_score, median_income, referral_flag, season
- Target: binary (converted vs. not), outputs calibrated probability scaled 0-100
- Evaluation: AUC = 0.81, precision@top-20% = 0.67 (5-fold CV)
Demand Forecast (Prophet + GeoPandas)
- Granularity: monthly permit volume per census tract
- Spatial smoothing: inverse-distance weighting from adjacent tracts
- Accuracy: MAPE = 11% on 12-month holdout across 24 tracts
Marketing Optimization (scipy)
- Method: constrained linear programming via scipy.optimize.linprog
- Constraints: total monthly budget cap, minimum spend per active ZIP
- Objective: maximize expected conversions weighted by tract demand score
Facing similar challenges?
Let's discuss how data science can drive results for your business.

