Bibek Dhakal

Data Engineer

I am a Data Engineer with expertise in designing and implementing scalable cloud data pipelines, ETL automation, and data infrastructure optimization. My work focuses on building reliable data systems that enable data-driven decision making across healthcare, e-commerce, and fintech domains.

I hold a Bachelor of Engineering in Computer Engineering from Tribhuvan University and am currently based in Kathmandu, Nepal.

Areas of Expertise

As a data engineering professional, I specialize in building scalable data infrastructure and implementing efficient data solutions for enterprise environments. My expertise spans cloud data engineering, large-scale data processing, and machine learning operations, with a focus on delivering measurable business value through robust, production-ready systems.

Core Competencies

Professional Experience

Over 5+ years of professional experience in data engineering, I have worked on diverse projects spanning healthcare data systems, large-scale analytics platforms, financial services infrastructure, and machine learning applications. My work focuses on building scalable, reliable data systems that enable data-driven decision making.

Key Contributions & Technical Achievements

Technical Skills

Cloud & Infrastructure

  • AWS (S3, EMR, Glue, RDS, Lambda, CloudWatch, Redshift)
  • Azure (Data Factory, Databricks, Synapse Analytics, Blob Storage)
  • Docker, CI/CD pipelines

Data Engineering

  • Apache Spark (PySpark, Scala)
  • Apache Airflow, Apache Kafka
  • TimeXtender, dbt
  • ETL/ELT Design Patterns

Programming & Databases

  • Python, SQL (PostgreSQL, MySQL, T-SQL), Scala
  • PostgreSQL, MySQL, MongoDB, Azure SQL, MariaDB
  • Data Warehousing, Star Schema Modeling

Analytics & Visualization

  • Pandas, NumPy, Scikit-learn
  • Power BI, DAX, Data Modeling
  • Spark MLlib, TensorFlow
  • Statistical Analysis, Time Series

Selected Projects

Twitter Sentiment Analysis

Developed a comprehensive sentiment analysis system for Twitter data using advanced natural language processing techniques. Implemented Word2Vec for feature extraction and XGBoost for classification, achieving high accuracy in sentiment prediction.

View on GitHub →

Market Basket Analysis & Recommendation System

Major research project analyzing big data for product bundle recommendations. Utilized PySpark for distributed processing, Word2Vec for text embedding, and K-means clustering combined with bi-gram frequency analysis to generate intelligent product recommendations.

View on GitHub →

Time Series Forecasting - Passenger Count Prediction

Applied SARIMA (Seasonal AutoRegressive Integrated Moving Average) modeling for passenger count forecasting. Conducted comprehensive time series analysis achieving RMSE of 68.132, demonstrating proficiency in statistical modeling and predictive analytics.

View on GitHub →

Weather Forecasting System

Developed a weather prediction system using SARIMA modeling with comprehensive data analysis and visualization. Achieved high accuracy with RMSE of 2.19, demonstrating strong capability in time series analysis and meteorological data processing.

View on GitHub →

Contact

I am interested in opportunities involving data engineering, machine learning systems, and scalable data infrastructure. I am open to relocation for the right opportunity in Canada, USA, or Europe.