Project Overview

Business Requirements

  1. Sales by Gender and Product Category: A dashboard showing the total products sold, total sales revenue, and a gender split among customers.
  2. Data Filtering: Ability to filter the data by product category, gender, and date.
  3. User-Friendly Interface: Stakeholders should have access to an easy-to-use interface for making queries.

Solution Overview

my-project-architecture
  1. Data Ingestion:
    • Extract customer and sales data from an on-premises SQL database.
    • Load the data into Azure Data Lake Storage (ADLS) using Azure Data Factory (ADF).
  2. Data Transformation:
    • Use Azure Databricks to clean and transform the data.
    • Organize the data into Bronze, Silver, and Gold layers for raw, cleansed, and aggregated data respectively.
  3. Data Loading and Reporting:
    • Load the transformed data into Azure Synapse Analytics.
    • Build a Power BI dashboard to visualize the data, allowing stakeholders to explore sales and demographic insights.
  4. Automation:
    • Schedule the pipeline to run daily, ensuring that the data and reports are always up-to-date.

Technology Stack

  • Azure Data Factory (ADF): For orchestrating data movement and ingestion.
  • Azure Data Lake Storage (ADLS): For storing raw and processed data.
  • Azure Databricks: For data transformation and processing.
  • Azure Synapse Analytics: For data warehousing and SQL-based analytics.
  • Power BI: For data visualization and reporting.
  • Azure Key Vault: For securely managing credentials and secrets.
  • SQL Server (On-Premises): Source of customer and sales data.

Setup Instructions

  • An Azure account with sufficient credits.
  • Access to an on-premises SQL Server database.
  1. Create Resource Group: Set up a new resource group in Azure.
  2. Provision Services:
    • Create an Azure Data Factory instance.
    • Set up Azure Data Lake Storage with bronze, silver, and gold containers.
    • Set up an Azure Databricks workspace and Synapse Analytics workspace.
    • Configure Azure Key Vault for secret management.
  1. Set up SQL Server: Install SQL Server and SQL Server Management Studio (SSMS). Restore the AdventureWorks database.
  2. Ingest Data with ADF: Create pipelines in ADF to copy data from SQL Server to the bronze layer in ADLS.
my-adventureworks-database-on-sql-server
my-rg-and-adf
my-sa
my-azure-databricks-ws
my-synapse
my-containers
my-key-vault
my-all-resources-in-rg
my-iam-role-for-key-vault
my-ingestion-pipeline
my-bronze-layer-data
  1. Mount Data Lake in Databricks: Configure Databricks to access ADLS.
  2. Transform Data: Use Databricks notebooks to clean and aggregate the data, moving it from bronze to silver and then to gold.
my-silver-container
my-extended-pipeline
  1. Load Data into Synapse: Set up a Synapse SQL pool and load the gold data for analysis.
  2. Create Power BI Dashboard: Connect Power BI to Synapse and create visualizations based on business requirements.
my-synapse-views
my-dynamic-views
my-all-tables-views-in-synapse
My-power-bi-dashboard
  1. Schedule Pipelines: Use ADF to schedule the data pipelines to run daily.
  2. Monitor Pipeline Runs: Use the monitoring tools in ADF and Synapse to ensure successful pipeline execution.
  1. Manage Access: Set up role-based access control (RBAC) using Azure Entra ID (formerly Active Directory).
  1. Trigger and Test Pipelines: Insert new records into the SQL database and verify that the entire pipeline runs successfully, updating the Power BI dashboard.

Conclusion

By Akshay Tekam

software developer, Data science enthusiast, content creator.

Leave a Reply

Your email address will not be published. Required fields are marked *