Business Situation
- One of the largest US-based media companies faced significant challenges in data management and analytics.
- The client would manually extract data from various sources and create reports on an ad-hoc basis without automation.
- The lack of real-time processing capabilities and a single source of truth hindered the timely generation of insights, and historical analysis was error-prone and inefficient.
SGA Approach
Technology
- Leveraged Google Cloud Platform (GCP) for scalable cloud infrastructure and utilized Google Cloud Composer for workflow orchestration
- Implemented Databricks for large-scale data processing and deployed Tableau for data visualization in a Django environment
AI
- Developed user-interpretable AI models for automated insights generation and created Spark-based ML models for CLTV prediction
Data
- Implemented Python-based scripts for data extraction from multiple sources and converted them to Parquet format for efficient storage and access
- Created a centralized data lake for unified data storage and analysis
Key Takeaways
- Established a centralized data lake as a single source of truth for current and future analysis and reduced time to outcome (TTO) by 95% for big data processing
- Enabled real-time data processing and automated report generation and increased raw data processing capacity from 1 GB to 150 GB
- Improved data accessibility and visualization through Tableau dashboards and expanded automated reports from 5 to 25