I'm Siddhartha Varanasi, a Data Engineer/Analyst with 3.5+ years of experience transforming complex data into strategic business advantages. I architect and optimize robust data solutions using Python, Azure, Snowflake, SQL, and PySpark.
My expertise has driven tangible results, including reducing data processing times by 40% and enhancing data accuracy by 30%, directly fueling operational efficiency and data-driven decision-making. As a recent Master's graduate, I continue to deepen my commitment to data innovation, including developing AI-driven data solutions and working with natural language processing for AI assistants. I am Snowflake Certified and Microsoft Certified.
Computer Science, University of Cincinnati (Perfect 4.0 GPA)
August 2023 - May 2025
Certifications
Snowflake Certified: SnowPro Core (April 2025)
Microsoft Certified: Power BI Data Analyst Associate (June 2024)
Microsoft Certified: Azure Fundamentals (April 2024)
Experience
Data Pipeline Design
Designed and maintained scalable data pipelines (ETL) using Python and PySpark on cloud platforms like Azure Databricks, handling over 200,000 daily records for data warehouses like Snowflake.
Data Integration & Curation
Organized and integrated cross-functional datasets from various sources using SQL and Python. This work supported AI-driven insights and improved business reporting accuracy by 30%.
Automation & Optimization
Automated data transformation (dbt) models and Power BI reports using continuous integration/continuous deployment (CI/CD) practices, reducing manual release time by 50%. Optimized cloud storage solutions like Azure Blob Storage and ADLS, cutting storage costs by 25%.
Key Technical Contributions
Agile Project Management
Managed project tasks and collaborated in Agile teams, accelerating delivery by 20% and improving documentation for clarity and efficiency.
Data Integration & Modeling
Built data pipelines using various tools (Azure Data Factory, Informatica, PySpark) to combine information from different systems into our data warehouse, boosting data reliability by 35%.
System Performance
Streamlined software deployments using version control and scripting, reducing setup time by 40%. Optimized database queries and ensured data quality, improving performance by 30%.
Advanced Data Analytics
We use powerful tools like Power BI, Tableau, SQL, and Python to create clear dashboards and reports. This work turns raw data into actionable insights, *improving* decision-making accuracy by 25% and *enabling* more proactive business strategies.
Efficient Data Migration
We successfully lead and execute projects to move large amounts of data from older systems (like DB2 and Oracle) to modern platforms such as Snowflake. We use Azure Data Factory and Informatica to keep data accurate and minimize interruptions during these transfers.
Robust Data Validation
We design and implement rigorous data validation systems that significantly *enhance* data quality and reliability. By developing automated checks and custom SQL/Python scripts, we identify and fix discrepancies, *reducing* data error rates by 30%.
Featured Projects
Key projects showcasing technical expertise and innovation in data engineering and analytics.
Talk2SQL – Natural Language Database Assistant
Challenge
Non-technical users often struggled to access and analyze data from databases without knowing SQL. This created a bottleneck, slowing down decision-making and making things less efficient.
Solution
Created Talk2SQL, an easy-to-use AI assistant that turns everyday language into exact SQL commands. It uses Streamlit for a simple interface, and LangChain with Groq's LLaMA3 model for smart language processing. Talk2SQL makes data accessible to everyone.
Results
Generates SQL with 100% accuracy, stopping manual errors and ensuring reliable data.
Works with many database systems, including SQLite, PostgreSQL, and MySQL.
Speed up query response time by 30% by optimizing how data is found and stored.
Users feel more confident with a clear chat interface that shows both the SQL generated and the results instantly.
Kroger Household Transactions Analysis
Challenge
Kroger had a big problem: processing and analyzing huge amounts of household transaction data. Their old systems couldn't handle all the data, which slowed down getting useful information and making quick, smart business decisions.
Solution
I built a strong Python application using Flask to analyze Kroger's vast retail data. I put this solution on Azure Cloud, using Azure SQL for better data handling and storage. A simple interface, made with HTML and CSS, made it easy for users to interact with. Plus, a Tableau dashboard was added to help everyone see and understand the data clearly.
Results
Improved data processing speed by 35%, meaning quicker insights from massive transaction data.
Boosted query performance by 35% thanks to better data management and Azure SQL integration, allowing fast data retrieval.
Increased data visibility by 40% through the easy-to-use HTML-CSS interface and integrated Tableau dashboard, leading to better business choices.
Processing Efficiency
+35% Improvement
Query Performance
+35% Boost
Data Visibility
+40% Enhancement
Advanced Weather Forecasting with Big Data
Challenge
We faced the challenge of analyzing vast global weather data from 2010 to 2022 to identify temperature patterns and improve forecast accuracy. The sheer volume and variety of data, sourced from over 5,000 weather stations and more than 50 files, demanded a powerful and adaptable processing solution.
Solution
I built a PySpark-based system to process these large weather datasets, focusing on efficiently cleaning, transforming, and analyzing the data. We then developed advanced models using PySpark to predict temperature trends, significantly improving forecasting accuracy. Interactive Power BI dashboards were also integrated to provide clear insights for easier data analysis and visualization.
Results
Successfully processed and analyzed data from over 5,000 global weather stations (2010–2022).
Improved weather forecast accuracy through advanced temperature prediction models.
Created and deployed interactive dashboards in Power BI for intuitive data analysis and informed decisions.
Sources
5,000+ weather stations
Period
2010-2022 (12 years)
Files
50+ data files
Outcome
Improved forecast accuracy
My Career Focus
Desired Roles
Seeking Data Engineer and Data Analyst positions. Open to remote, on-site, or hybrid work environments.
Company Alignment
Interested in companies that prioritize data innovation, leverage cloud platforms (like Azure or AWS), and build robust, scalable data systems.
Technical Enthusiasm
Passionate about designing efficient ETL/ELT data pipelines, building data warehouses, and creating effective business intelligence dashboards.