Data

Our datasets for this project contain comprehensive daily call center operational metrics with complete data coverage across two years of call center operations, spanning from November 27, 2023, to November 15, 2025.

Data Inventory

Note: The actual dataset files have been excluded from this repository for confidential reasons.

Dataset Source Notes
Call-Related Data Routing / Telephony Logs Granular interaction leg details. Contains customer_id and expert_id (needs masking).
Expert Metadata WFM / HR Systems Expert lifecycle, status, and aggregated performance metrics.
Historical Outcomes CRM / Ticketing System Session-level resolution, transfer counts, and outcomes.
Expert State & Interval Presence / Activity Logs 30-minute interval state, availability, and occupancy tracking.

Data Dictionaries

Contains granular metrics for individual interaction legs.

Data Field Data Type Description
tax_year INT Reporting year bucket.
cc_id STRING/ID Canonical contact identifier.
engagement_id STRING Engagement/workflow identifier.
interaction_date DATE Date anchor for the interaction.
arrival_time_utc TIMESTAMP Earliest observed arrival.
start_time_utc TIMESTAMP Earliest agent-leg start.
end_time_utc TIMESTAMP Latest agent-leg end.
customer_id STRING Customer identifier.
customer_id_source STRING Source of customer_id.
expert_id STRING Handling expert identifier.
answered_flag STRING Yes/No answered indicator.
product_group_sku STRING Product/routing grouping.
communication_channel_type STRING Channel/initiation descriptor.
communication_leg_direction STRING Interaction leg direction.

2. Expert Metadata

Captures the lifecycle, status, and aggregated metrics of call center experts.

Data Field Data Type Description
tax_year INT Reporting year bucket.
expert_id STRING/ID Canonical expert identifier.
first_started_date DATE Lifecycle start proxy date.
first_active_date DATE Active/production proxy date.
latest_termination_date DATE Lifecycle end date.
tenure_as_of_date DATE Date used as tenure boundary.
tenure_is_ongoing_flg INT Ongoing-tenure indicator.
tenure_days_since_start INT Days from first_started_date.
tenure_days_since_first_active INT Days from first_active_date.
agent_status STRING Current lifecycle status label.
active_flg STRING Current active indicator.
business_segment STRING Lifecycle business segment.
expert_segment STRING Lifecycle expert segment.
access_rule STRING Lifecycle access rule/profile.
routing_profiles_seen_raw STRING Observed routing profiles (raw).
routing_profiles_seen_clean STRING Observed routing profiles (clean).
skill_certifications STRING Combined skill descriptor.
contacts BIGINT Count of contact-level rows.
answered_contacts BIGINT Count of answered contacts.
average_handle_time_seconds DOUBLE Average handle seconds.
average_hold_time_seconds DOUBLE Average hold seconds.
resolution_rate DOUBLE Resolution proxy percentage.
transfer_rate DOUBLE Transfer percentage.

3. Historical Outcomes

Records the results, transfers, and resolution behaviors of customer sessions.

Data Field Data Type Description
tax_year INT Reporting year bucket.
calendar_year INT Calendar year.
session_contact_id STRING Canonical session key.
expert_assigned_id STRING Final assigned expert ID.
resolution_outcome STRING Outcome classification.
transfer_destination STRING Destination queue/profile.
transfer_count BIGINT Count of transfer events.
post_resolution_behavior STRING Repeat-contact indicator.
transfer_flag STRING Session-level transfer indicator.
first_call_resolution STRING First-call resolution proxy.
hold_time_seconds DOUBLE Total hold time seconds.
duration_of_call_minutes DOUBLE Session duration in minutes.
cc_id STRING Contact ID context.
engagement_id STRING Engagement context.
conversation_id STRING Conversation context.
case_number STRING Case identifier field.
customer_id STRING Customer identifier.
expert_assigned_id_source STRING Source of assigned expert.
expert_from_assignment_id STRING Assignment-summary expert ID.
expert_from_interaction_id STRING Interaction-derived expert ID.
expert_id_source_mismatch_flg INT Source mismatch flag.
expert_assigned_id_in_lifecycle_flg INT Lifecycle membership flag.
expert_assigned_id_domain_status STRING Domain status classification.
session_start_time_utc TIMESTAMP Session start timestamp.
session_end_time_utc TIMESTAMP Session end timestamp.
interaction_date DATE Session interaction date.

4. Expert State & Interval

Tracks adherence and activity states across standardized 30-minute intervals.

Data Field Data Type Description
tax_year INT Reporting year bucket.
date DATE UTC interval date.
time_interval_30m_utc STRING 30-minute UTC label.
expert_id STRING Canonical expert ID.
total_handle_time_seconds DOUBLE Raw handle seconds.
total_available_time_seconds DOUBLE Raw available seconds.
activity_break_meal_seconds DOUBLE Raw break/meal seconds.
activity_meeting_training_seconds DOUBLE Raw meeting/training seconds.
activity_offline_unavailable_seconds DOUBLE Raw offline seconds.
activity_uncategorized_seconds DOUBLE Raw uncategorized seconds.
primary_activity_category_30m STRING Dominant activity category.
occupancy_pct DOUBLE Raw occupancy percentage.
normalization_scale_factor DOUBLE Scaling factor.
total_handle_time_seconds_normalized DOUBLE Normalized handle seconds.
total_available_time_seconds_normalized DOUBLE Normalized available seconds.
activity_break_meal_seconds_normalized DOUBLE Normalized break/meal seconds.
activity_meeting_training_seconds_normalized DOUBLE Normalized meeting seconds.
activity_offline_unavailable_seconds_normalized DOUBLE Normalized offline seconds.
activity_uncategorized_seconds_normalized DOUBLE Normalized uncategorized.
occupancy_pct_normalized DOUBLE Normalized occupancy.
interval_start_utc TIMESTAMP Interval start timestamp.
interval_end_utc TIMESTAMP Interval end timestamp.
total_state_overlap_seconds DOUBLE Raw total overlap seconds.
total_state_overlap_seconds_normalized DOUBLE Capped overlap seconds.
interval_over_30m_flg INT Raw overage flag.
interval_over_30m_normalized_flg INT Normalized overage flag.

Back to top

© 2025 UC San Diego - Data Science Capstone

This site uses Just the Docs, a documentation theme for Jekyll.