top of page

Formula 1

Data Engineering Project on Azure Cloud

This project is a big data initiative leveraging Azure cloud services, particularly focusing on the emerging Data Lake architecture. It involves ingesting Formula-1 race data from an external API, storing it as Delta tables in Azure Data Lake Storage (ADLS), and then processing it for reporting and analysis.

​

Key components and concepts utilized in the project include:

 

  1. Batch Load: Full and incremental data ingestion strategies are employed.

  2. Delta Tables: Delta tables are used for efficient storage and management of data in ADLS.

  3. Azure Service Principal: Identity and access management is handled using Azure Service Principal.

  4. Key Vault: Sensitive information such as credentials and secrets are securely stored and managed using Azure Key Vault.

  5. Libraries: Data processing and manipulation are facilitated using libraries such as PySpark API or pandas.

  6. ETL/ELT Workflow: The workflow for Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) operations is orchestrated and automated using Azure Data Factory.

  7. Analysis & Visualization: After data preparation, analysis is conducted to identify significant patterns, particularly focusing on dominant Formula-1 drivers and teams throughout history.

​

Technologies used in the project include:

​

  • Azure Databricks: For data processing, analytics, and collaboration.

  • Azure Key Vault: For secure storage of keys, secrets, and certificates.

  • Azure Data Factory: For orchestrating and automating data workflows.

  • Azure Data Lake Gen2: As the storage solution for structured and unstructured data.

  • Power BI: For visualization and reporting.

  • Delta Lakes: For efficient and reliable data lake storage and management.

​

Download Project

bottom of page