top of page
Formula 1
Data Engineering Project on Azure Cloud

This project is a big data initiative leveraging Azure cloud services, particularly focusing on the emerging Data Lake architecture. It involves ingesting Formula-1 race data from an external API, storing it as Delta tables in Azure Data Lake Storage (ADLS), and then processing it for reporting and analysis.
​
Key components and concepts utilized in the project include:
-
Batch Load: Full and incremental data ingestion strategies are employed.
-
Delta Tables: Delta tables are used for efficient storage and management of data in ADLS.
-
Azure Service Principal: Identity and access management is handled using Azure Service Principal.
-
Key Vault: Sensitive information such as credentials and secrets are securely stored and managed using Azure Key Vault.
-
Libraries: Data processing and manipulation are facilitated using libraries such as PySpark API or pandas.
-
ETL/ELT Workflow: The workflow for Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) operations is orchestrated and automated using Azure Data Factory.
-
Analysis & Visualization: After data preparation, analysis is conducted to identify significant patterns, particularly focusing on dominant Formula-1 drivers and teams throughout history.
​
Technologies used in the project include:
​
-
Azure Databricks: For data processing, analytics, and collaboration.
-
Azure Key Vault: For secure storage of keys, secrets, and certificates.
-
Azure Data Factory: For orchestrating and automating data workflows.
-
Azure Data Lake Gen2: As the storage solution for structured and unstructured data.
-
Power BI: For visualization and reporting.
-
Delta Lakes: For efficient and reliable data lake storage and management.
​
bottom of page