Model Card: Combined Demand Forecaster

Model Summary

Overview

Property	Value
Model Name	CombinedForecaster
Version	1.0
Type	Regression (Multi-Horizon Time Series Forecasting)
Architecture	Dual LightGBM (Short-term single model + Long-term 3-model ensemble) with anomaly smoothing and holiday overrides
File	`src/main_module/workforce/combined_forecaster.py`
Saved Model	`scripts/combined_forecast_model.pkl`

Description

The Combined Demand Forecaster unifies the best elements of two predecessor models — the HybridForecaster (multi-dataset, multi-horizon LightGBM architecture) and the Dynamic Weeks Forecaster (anomaly smoothing, major/minor holiday distinction, holiday profile overrides, and tax-cycle features). It predicts 30-minute interval call volume across two horizons: a short-term model (< 7 days ahead) using recent lags and operational features, and a long-term 3-model ensemble (≥ 7 days ahead) using historical patterns and year-over-year indicators. On major holidays, ML predictions are bypassed in favor of historical holiday profiles for more reliable estimates.

Architecture

┌──────────────────────────────────────────────────────────────────┐
│                   COMBINED FORECASTER v1                         │
├──────────────────────────────────────────────────────────────────┤
│                                                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │               DATA PREPROCESSING                           │  │
│  │  • Anomaly smoothing (known outliers interpolated)          │  │
│  │  • Multi-dataset merge (datasets 1, 3, 4)                 │  │
│  └────────────────────────────────────────────────────────────┘  │
│                                                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │              PREDICTION ROUTING                            │  │
│  │  1. Major holiday? → Holiday profile lookup (bypass ML)    │  │
│  │  2. Horizon < 7 days? → Short-Term Model                  │  │
│  │  3. Horizon ≥ 7 days? → Long-Term Ensemble                │  │
│  └────────────────────────────────────────────────────────────┘  │
│                                                                  │
│  ┌──────────────────────┐     ┌──────────────────────────────┐  │
│  │  SHORT-TERM MODEL    │     │  LONG-TERM ENSEMBLE          │  │
│  │  (1× LGBMRegressor)  │     │  (3× LGBMRegressor, avg)    │  │
│  ├──────────────────────┤     ├──────────────────────────────┤  │
│  │  • 800 estimators    │     │  Model A: 1500 est, lr=0.015 │  │
│  │  • lr = 0.03         │     │  Model B: 1500 est, lr=0.015 │  │
│  │  • 127 leaves        │     │  Model C: 1500 est, lr=0.02  │  │
│  │  • Early stopping    │     │  • All with early stopping   │  │
│  │  • Linear recency    │     │  • Quadratic recency weights │  │
│  │    weights            │     │  • L1=1.0, L2=2.0 reg       │  │
│  │                      │     │  • Predictions averaged      │  │
│  │  (55 features)       │     │  (45 features)               │  │
│  └──────────────────────┘     └──────────────────────────────┘  │
│                                                                  │
│  ┌────────────────────────────────────────────────────────────┐  │
│  │             HISTORICAL PATTERN LOOKUP                      │  │
│  │  Pre-computed: dow×hour, month×dow×hour, week-of-year,    │  │
│  │  quarter×dow, time-slot means/stds, YoY patterns          │  │
│  │  + Major holiday profiles (48 intervals per holiday)      │  │
│  └────────────────────────────────────────────────────────────┘  │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

What It Combines

Feature	Source: HybridForecaster	Source: Dynamic Weeks
Multi-horizon (ST + LT)	✓
LightGBM + early stopping	✓
3-model LT ensemble	✓
Multi-dataset integration (3 parquets)	✓
Recency-weighted training	✓
Channel mix features	✓
Operational metric features	✓
Year-aligned train/test split	✓
RobustScaler	✓
Anomaly smoothing (outliers interpolation)		✓
Major vs. minor holiday distinction		✓
Holiday profile overrides at inference		✓
`is_january` feature		✓
`is_post_tax_drop` feature		✓

Inputs and Outputs

Input:

Historical call center data (Parquet format, from dataset_1_call_related.parquet)
Supplementary operational data (dataset_3_historical_outcomes.parquet, dataset_4_expert_state_interval.parquet)
Target datetime for prediction
Forecast horizon (automatic routing)

Short-Term Model Features (55 total):

Category	Features	Count
Temporal	hour, minute, day_of_week, day_of_month, month, time_slot, week_of_year, day_of_year	8
Tax/Holiday	is_holiday, is_major_holiday, is_january, days_to_tax_deadline, tax_urgency, is_post_tax_drop	6
Cyclical Encoding	hour_sin, hour_cos, dow_sin, month_sin, month_cos	5
Lag Features	lag_1, lag_2, lag_4, lag_48, lag_336, lag_672, lag_same_time_yesterday, lag_same_time_last_week	8
Difference Features	diff_1, diff_48, diff_336	3
Rolling Statistics	rolling_mean_4/12/48/336, rolling_std_4/48, rolling_max_4	7
EWM Features	ewm_mean_12, ewm_mean_48	2
Trend Features	hourly_trend, daily_trend	2
Advanced	volatility_ratio, momentum	2
Channel Mix	inbound_ratio, chat_ratio, callback_ratio	3
Operational Lags	lag_transfer_rate, lag_fcr_rate, lag_mean_hold, lag_active_experts, lag_mean_occupancy, lag_total_avail	6
Operational Rolling	rolling_experts_48, rolling_occupancy_48	2
Year-over-Year	yoy_same_dow_hour_mean	1

Long-Term Model Features (45 total):

Category	Features	Count
Temporal	hour, day_of_week, day_of_month, month, time_slot, week_of_year, day_of_year	7
Tax/Holiday	is_holiday, is_major_holiday, is_january, days_to_tax_deadline, tax_urgency, is_post_tax_drop	6
Cyclical Encoding	hour_cos, dow_sin, month_sin, month_cos	4
Historical Aggregates	hist_dow_hour_mean/std/median, hist_month_dow_hour_mean/std, hist_month_mean, hist_time_slot_mean, hist_week_of_year_mean, hist_quarter_dow_mean	9
Long Rolling	rolling_mean_336/672, ewm_mean_336/672	4
Channel Mix	inbound_ratio, chat_ratio, callback_ratio	3
Historical Operational	hist_transfer_rate, hist_fcr_rate, hist_mean_hold, hist_mean_experts, hist_mean_occupancy	5
Year-over-Year	yoy_same_dow_hour_mean, yoy_same_week_mean	2
Recent Window	recent_quarter_mean, recent_month_mean, hist_recent_dow_hour_mean	3
Slot Aggregates	hist_dow_time_slot_mean, hist_month_time_slot_mean	2

Output:

Predicted call count (integer, clipped ≥ 0) for a 30-minute interval

Model Usage and Limitations

Intended Usage

Primary Use: Multi-horizon call volume forecasting for Intuit QuickBooks / SBSEG support
Users: Call center managers, workforce planners, capacity analysts
Applications:
- Short-term scheduling (1–7 days ahead)
- Long-term capacity planning (1–4+ weeks ahead)
- Seasonal workforce budgeting (tax season preparation)
- Integration with CallCenterEmulator and SupplyOptimizer for staffing recommendations

Benefits Over Predecessor Models

Anomaly Robustness: Known data outliers (e.g., 2025-08-29) are automatically smoothed via interpolation, preventing the model from training on corrupted intervals
Holiday Accuracy: Major holidays (New Year’s, Thanksgiving, Christmas) use historical profile lookup instead of ML prediction, which is more reliable for these rare, extreme-pattern days
Richer Calendar Signals: is_major_holiday, is_january, and is_post_tax_drop capture domain-specific seasonal patterns that the pure HybridForecaster lacked
Lower Extreme Errors: Anomaly smoothing and holiday overrides produce a lower RMSE than the HybridForecaster, meaning fewer large prediction misses
All HybridForecaster Strengths Retained: Multi-dataset integration, dual-horizon architecture, recency weighting, LightGBM ensemble, operational features

Limitations

Year-over-Year Drift: A 5–15% volume decline was observed between 2024 and 2025; recency weighting mitigates but does not fully eliminate this
Business Hours: Assumes UTC timestamps with Pacific Time business hours (UTC 13:00–01:00, Mon–Fri)
Training Data Requirement: Requires data spanning at least two years for YoY features
Long-Term Accuracy: WMAPE of ~13% for ≥7-day forecasts reflects inherent difficulty of long-horizon prediction
Domain Specific: Optimized for Intuit QB/SBSEG call patterns; requires retraining for other domains
Known Anomalies List: The _KNOWN_ANOMALIES list must be manually updated when new outliers are identified

Out-of-Scope Uses

Sub-interval predictions (less than 30 minutes)
Individual call outcome or duration prediction
Non-call-center demand forecasting without retraining
Real-time anomaly detection

Evaluation

Performance Metrics

Test Set Performance (Train: Jan–Oct 2024, Test: Jan–Oct 2025):

Metric	Short-Term (< 7 days)	Long-Term (≥ 7 days)
MAE	28.56 calls	119.22 calls
RMSE	58.96 calls	220.86 calls
R²	0.9979	0.9705
WMAPE	3.11%	13.00%

Head-to-Head Comparison (Same Test Set)

Model	MAE	RMSE	R²	WMAPE	Features
Combined (ST)	28.56	58.96	0.9979	3.11%	55
Hybrid (ST)	27.20	91.47	0.9950	2.95%	52
Combined (LT)	119.22	220.86	0.9705	13.00%	45
Hybrid (LT)	117.35	247.37	0.9635	12.74%	42
Dynamic Weeks (RF+GBM)	91.74	188.30	0.9786	10.01%	15

Key Observations:

Combined achieves 35% lower RMSE than Hybrid on short-term (58.96 vs 91.47), meaning far fewer large prediction errors
Combined achieves 11% lower RMSE than Hybrid on long-term (220.86 vs 247.37)
Combined trades a minor MAE/WMAPE increase (~0.2-0.3%) for substantially better outlier handling
Dynamic Weeks is a single-horizon model with no short/long distinction; its 10% WMAPE is far worse than either specialized model’s short-term performance

Top Features

Short-Term Model (Top 10):

Rank	Feature	Category
1	diff_1	Difference
2	diff_336	Difference (1 week)
3	yoy_same_dow_hour_mean	Year-over-Year
4	lag_1	Lag (30 min ago)
5	lag_336	Lag (7 days ago)
6	diff_48	Difference (1 day)
7	lag_672	Lag (14 days ago)
8	inbound_ratio	Channel Mix
9	callback_ratio	Channel Mix
10	day_of_month	Temporal

Long-Term Model (Top 10):

Rank	Feature	Category
1	hist_month_dow_hour_mean	Historical Aggregate
2	callback_ratio	Channel Mix
3	hist_week_of_year_mean	Historical Aggregate
4	hist_month_dow_hour_std	Historical Aggregate
5	yoy_same_dow_hour_mean	Year-over-Year
6	day_of_month	Temporal
7	inbound_ratio	Channel Mix
8	day_of_year	Temporal
9	ewm_mean_336	Long Rolling
10	rolling_mean_336	Long Rolling

Evaluation Methodology

Train/Test Split: Year-aligned with shared complete months (Jan–Oct 2024 for training, Jan–Oct 2025 for testing) to ensure consistent seasonal distribution
Incomplete Month Handling: If the last month in the test year has fewer than 28 days of data, it is dropped
Recency Weighting: Short-term uses linear weights (0.2 + 0.8 × normalized_index); long-term uses quadratic weights (0.1 + 0.9 × normalized_index²)
Primary Metric: WMAPE (interpretable for staffing); MAE, RMSE, and R² also reported
Anomaly Smoothing: Known outlier dates are interpolated before training, preventing corrupted data from affecting model quality

Implementation

Software Dependencies

Python >= 3.9
numpy >= 1.26.0
pandas >= 2.2.0
scikit-learn >= 1.5.0
lightgbm >= 4.0.0
pyarrow >= 14.0.0

Training Configuration

Parameter	Value
Training Data	Jan–Oct 2024 (14,640 intervals)
Test Data	Jan–Oct 2025 (14,592 intervals)
Feature Scaling	RobustScaler (outlier-resistant)
Short-Term Threshold	7 days
Short-Term Recency Weights	Linear: 0.2 + 0.8 × (i / max_i)
Long-Term Recency Weights	Quadratic: 0.1 + 0.9 × (i / max_i)²
Early Stopping	50 rounds (both models)
Anomaly Smoothing	Linear interpolation for known outlier dates
Training Time	~50–60 seconds on Apple M-series

Model Hyperparameters

Short-Term (LGBMRegressor):

n_estimators=800, learning_rate=0.03, num_leaves=127,
max_depth=9, min_child_samples=15, subsample=0.8,
colsample_bytree=0.7, reg_alpha=0.05, reg_lambda=0.5,
early_stopping_rounds=50

Long-Term Ensemble (3× LGBMRegressor):

Model A: n_estimators=1500, lr=0.015, num_leaves=200, max_depth=9,
         subsample=0.8, colsample=0.6, min_child=15, seed=42
Model B: n_estimators=1500, lr=0.015, num_leaves=200, max_depth=9,
         subsample=0.7, colsample=0.5, min_child=15, seed=7
Model C: n_estimators=1500, lr=0.02,  num_leaves=127, max_depth=8,
         subsample=0.85, colsample=0.7, min_child=20, seed=123
All: reg_alpha=1.0, reg_lambda=2.0, early_stopping_rounds=50

Usage

Training:

from main_module.workforce.combined_forecaster import CombinedForecaster

forecaster = CombinedForecaster()
forecaster.train("data/raw/dataset_1_call_related.parquet", train_year=2024, test_year=2025)
forecaster.save_model("scripts/combined_forecast_model.pkl")

Inference:

forecaster = CombinedForecaster()
forecaster.load_model("scripts/combined_forecast_model.pkl")

prediction = forecaster.predict("2025-03-15 14:00:00")

day_forecast = forecaster.predict_day("2025-03-15")

Model Data

Training Data

Property	Value
Source	Intuit call center records (QuickBooks / SBSEG)
Primary Dataset	`dataset_1_call_related.parquet`
Supplementary	`dataset_3_historical_outcomes.parquet`, `dataset_4_expert_state_interval.parquet`
Time Period	November 2023 – November 2025
Total 30-min Intervals	34,512
Train Intervals	14,640 (Jan–Oct 2024)
Test Intervals	14,592 (Jan–Oct 2025)

Data Preprocessing Pipeline

Raw Parquet (call-level)
  → Aggregate to 30-min intervals (count + channel ratios)
  → Smooth known anomalies (interpolation)
  → Merge operational metrics from datasets 3 & 4
  → Create base temporal features (incl. holiday/tax features)
  → Compute historical patterns + holiday profiles (training data only)
  → Add lag, rolling, YoY, and operational features
  → Feature scaling (RobustScaler)

Known Anomalies Handled

Date	Description
2025-08-29	Full-day data spike; all intervals interpolated

Multi-Dataset Integration

Dataset	Features Extracted
dataset_1 (calls)	call_count, inbound_ratio, chat_ratio, callback_ratio
dataset_3 (outcomes)	transfer_rate, fcr_rate (first contact resolution), mean_hold
dataset_4 (expert state)	active_experts, mean_occupancy, total_available_time

Integration

System Architecture

The CombinedForecaster is a drop-in replacement for the HybridForecaster in the three-component pipeline:

CombinedForecaster.predict(datetime) → predicted_demand (int)
         │
         ▼
CallCenterEmulator.simulate_interval(supply, demand) → EmulatorMetrics
         │
         ▼
SupplyOptimizer.optimize(demand, constraints) → OptimalSupply (headcount)

Deployment Options

Method	Description
FastAPI Backend	`src/main_module/api/main.py` — serves REST API at port 8000; loads pickle on startup
Streamlit Dashboard	`scripts/dashboard.py` — interactive Python dashboard at port 8501
React Dashboard	`src/main_module/visualization/` — TypeScript frontend calling FastAPI at port 3000
Docker Compose	`docker-compose.yml` — containerized backend + frontend
CLI Pipeline	`scripts/run_pipeline.py` — train, forecast, and optimize from command line

from main_module.workforce.combined_forecaster import CombinedForecaster
forecaster = CombinedForecaster()

Ethics and Safety

Privacy Considerations

No PII used in features or predictions
All predictions are aggregated at 30-minute interval level
Model state does not contain customer information

Fairness

Predictions are volume-based, not individual-level
No demographic features used
Applies equally across communication channels (inbound, chat, callback)

Transparency

Full feature list documented above
Feature importance computed and reported after each training run
Training/test split methodology ensures no data leakage
Year-over-year volume drift explicitly documented
Known anomalies and their handling are documented