Machine Learning & MLOps Engineer
I design, build, and scale intelligent systems — from streaming NLP pipelines to real-time LLM applications. Driven by curiosity, I combine data engineering with AI research to turn ideas into production-ready systems.
View My LinkedIn Profile
Databricks | PySpark Structured Streaming | Delta Lake | Hugging Face | MLflow
Designed and deployed a real-time streaming pipeline to classify tweet sentiments at scale using transformer-based NLP models within Apache Spark Structured Streaming.
The system processes millions of tweets with low latency, leveraging a Delta Lake multi-layer architecture (Bronze → Silver → Gold) and MLflow for tracking, model registry, and deployment.
Goal: Bridge the gap between scalable data engineering and NLP by integrating distributed streaming with real-time model inference.
Data Flow:
Twitter Stream (JSON) → Bronze (Raw) → Silver (Cleaned) → Gold (Predicted)
Components:
cardiffnlp/twitter-roberta-base-sentiments3a://voc-75-databricks-data/voc_volume/full_text, timestamp, lang, user_idcardiffnlp/twitter-roberta-base-sentiment (Hugging Face)📎 Full Code:
👉 GitHub — Starter Streaming Tweet Sentiment (Spring 2024 Final Project)
📘 Keywords:
PySpark Structured Streaming | Delta Lake | Databricks | Transformer Models | MLflow |
Real-Time Inference | Data Engineering | Hugging Face | MLOps | Sentiment Analysis