Weekly Spotify Wrapped is a data engineering project that leverages modern streaming technologies to analyze your Spotify listening habits on a weekly basis. Using Kafka as the data streaming tool, PySpark for real-time transformation, and PostgreSQL for data storage, this project enables seamless end-to-end data processing and insights generation.
Key Features
Stream Spotify Data
Using Kafka, real-time data streaming is set up to capture events from Spotify’s API and integrate them with other services.
Real-time Data Transformation
With PySpark, we handle complex transformations, ensuring the data is cleaned, processed, and ready for analysis or further consumption.
PostgreSQL for Storage
The pipeline stores processed data in PostgreSQL, ensuring a structured and reliable database for analysis.
Weekly Insights
By orchestrating the pipeline to run weekly, the project generates insights into your listening habits. You can schedule it with tools like Airflow or cron jobs.
Technologies Used
- Flask: Serving API endpoints for fetching and storing Spotify data
- Kafka: Real-time data streaming and message queuing
- PySpark: Distributed processing for data transformations
- PostgreSQL: Relational database to store the transformed data
- Gemini AI: Insight generation with custom prompts for your listening history
Conclusion
This project explores how to build a complete data pipeline using state-of-the-art tools and best practices in data engineering. By combining Flask, Kafka, Spark, and PostgreSQL, the project demonstrates how to create a robust, flexible system capable of scaling and adapting to various use cases.
Check out the source code on GitHub for more details on how it was built.