Bibek Dhakal - Data Engineer

Data Engineer

I am a Data Engineer with expertise in designing and implementing scalable cloud data pipelines, ETL automation, and data infrastructure optimization. My work focuses on building reliable data systems that enable data-driven decision making across healthcare, e-commerce, and fintech domains.

I hold a Bachelor of Engineering in Computer Engineering from Tribhuvan University and am currently based in Kathmandu, Nepal.

Email GitHub LinkedIn CV

Areas of Expertise

As a data engineering professional, I specialize in building scalable data infrastructure and implementing efficient data solutions for enterprise environments. My expertise spans cloud data engineering, large-scale data processing, and machine learning operations, with a focus on delivering measurable business value through robust, production-ready systems.

Core Competencies

Cloud Data Architecture: Designing and implementing enterprise-grade data pipelines on AWS and Azure platforms, with proven track record in reliability, performance optimization, and cost reduction. Expertise in cloud-native technologies including S3, EMR, Glue, Azure Data Factory, and Databricks.
Large-Scale Data Processing: Building and optimizing distributed data processing systems using Apache Spark, with demonstrated success in handling 2TB+ daily data volumes. Proficient in query optimization, performance tuning, and managing both batch and real-time streaming workloads.
ETL Pipeline Development: Developing production-grade ETL/ELT workflows using Apache Airflow and modern orchestration tools. Strong focus on data quality assurance, automated validation, and managing complex data dependencies across multiple systems.
Database Optimization & Management: Extensive experience in database performance tuning, query optimization, and infrastructure monitoring. Proven ability to achieve significant performance improvements (60%+ query optimization, 12× throughput gains) in production environments.
Machine Learning Integration: Implementing end-to-end ML pipelines including NLP, time series forecasting, and predictive modeling. Experience in productionizing machine learning models and integrating them into existing data infrastructure.

Professional Experience

Over 5+ years of professional experience in data engineering, I have worked on diverse projects spanning healthcare data systems, large-scale analytics platforms, financial services infrastructure, and machine learning applications. My work focuses on building scalable, reliable data systems that enable data-driven decision making.

Key Contributions & Technical Achievements

Healthcare Data Migration & Infrastructure: Leading zero-downtime data migration projects for healthcare systems, architecting Airflow-based orchestration platforms, and implementing ETL pipelines for centralized analytics warehouses. Developing Power BI models with advanced security features for HIPAA-compliant reporting.
Large-Scale Data Processing Optimization: Engineered production-grade Apache Spark pipelines processing 2TB+ daily data on AWS EMR. Achieved significant performance improvements through optimization techniques including 87% reduction in job execution times and 12× throughput improvement in MongoDB aggregation pipelines. Implemented SLA monitoring frameworks improving system reliability.
Financial Services Data Systems: Maintained mission-critical data systems for a digital payment platform processing 2M+ daily transactions. Developed PII detection pipelines, automated reporting workflows, and optimized database performance achieving 60% improvement in query response times.
Machine Learning & Fraud Detection: Developed Python-based ETL applications, built fraud detection systems reducing fraudulent activities by 35%, and applied NLP techniques for sentiment analysis. Delivered training sessions on data engineering fundamentals.

Technical Skills

Cloud & Infrastructure

AWS (S3, EMR, Glue, RDS, Lambda, CloudWatch, Redshift)
Azure (Data Factory, Databricks, Synapse Analytics, Blob Storage)
Docker, CI/CD pipelines

Data Engineering

Apache Spark (PySpark, Scala)
Apache Airflow, Apache Kafka
TimeXtender, dbt
ETL/ELT Design Patterns

Programming & Databases

Python, SQL (PostgreSQL, MySQL, T-SQL), Scala
PostgreSQL, MySQL, MongoDB, Azure SQL, MariaDB
Data Warehousing, Star Schema Modeling

Analytics & Visualization

Pandas, NumPy, Scikit-learn
Power BI, DAX, Data Modeling
Spark MLlib, TensorFlow
Statistical Analysis, Time Series

Selected Projects

Twitter Sentiment Analysis

Developed a comprehensive sentiment analysis system for Twitter data using advanced natural language processing techniques. Implemented Word2Vec for feature extraction and XGBoost for classification, achieving high accuracy in sentiment prediction.

View on GitHub →

Market Basket Analysis & Recommendation System

Major research project analyzing big data for product bundle recommendations. Utilized PySpark for distributed processing, Word2Vec for text embedding, and K-means clustering combined with bi-gram frequency analysis to generate intelligent product recommendations.

View on GitHub →

Time Series Forecasting - Passenger Count Prediction

Applied SARIMA (Seasonal AutoRegressive Integrated Moving Average) modeling for passenger count forecasting. Conducted comprehensive time series analysis achieving RMSE of 68.132, demonstrating proficiency in statistical modeling and predictive analytics.

View on GitHub →

Weather Forecasting System

Developed a weather prediction system using SARIMA modeling with comprehensive data analysis and visualization. Achieved high accuracy with RMSE of 2.19, demonstrating strong capability in time series analysis and meteorological data processing.

View on GitHub →

Areas of Expertise

Core Competencies

Professional Experience

Key Contributions & Technical Achievements

Technical Skills

Cloud & Infrastructure

Data Engineering

Programming & Databases

Analytics & Visualization

Selected Projects

Twitter Sentiment Analysis

Market Basket Analysis & Recommendation System

Time Series Forecasting - Passenger Count Prediction

Weather Forecasting System

Contact