Wonha (Leah) Shin

Logo


Machine Learning & MLOps Engineer

I design, build, and scale intelligent systems — from streaming NLP pipelines to real-time LLM applications. Driven by curiosity, I combine data engineering with AI research to turn ideas into production-ready systems.

View My LinkedIn Profile

View My GitHub Profile

Portfolio


My Github Blog

May 1st, 2024 — I made a promise to rebuild from the ground up.
After two years studying data science, I realized I could use algorithms but didn’t understand them deeply enough.
So I started over — posting every day to rebuild my intuition and technical depth.

Over a year, I revisited Statistics, Probability, Machine Learning, and Databases, then expanded into Deep Learning, LLMs, and MLOps. That journey transformed me — from learning algorithms to leading real-world ML & MLOps projects.

Through this commitment, I’ve written over 200 in-depth posts exploring everything from the foundations of statistics to the architecture of transformers and production-grade ML systems.

Each post became a reflection of growth — documenting not only what I learned but how I built, deployed, and optimized real systems.
Today, the blog stands as a living record of my continuous evolution from student to Machine Learning & MLOps Engineer — grounded in curiosity, consistency, and craftsmanship.

🔗 Visit My Blog →
📄 Download My 200 Days Challenge Summary (PDF)


Data Science Project

[LLMOps — LLM-Powered Real-Time Translation System]

| Kafka | FastAPI | Docker | AWS EC2 | Prometheus | Grafana | Hugging Face | LLMOps | MLOps | Machine Translation |

Designed and implemented a real-time multilingual translation system that combines advanced AI with scalable cloud infrastructure to process and translate continuous news streams with ultra-low latency.

🧩 Architecture & Key Components

🚀 Impact

🔗 Project Resources



🔗 Real-Time Tweet Sentiment Analysis Pipeline

| PySpark Structured Streaming | Delta Lake | Databricks | Transformer Models | MLflow | Real-Time Inference | Hugging Face | MLOps | Sentiment Analysis |

Built a real-time tweet sentiment classification pipeline on Databricks using Spark Structured Streaming and Transformer models, with MLflow for experiment tracking and Delta Lake for fault-tolerant storage.
Delivered live dashboards visualizing sentiment trends across millions of tweets.


🔗 Multilingual NLP Analysis of E-Cigarette Perceptions on Social Media

| NLP | BERTopic | RoBERTa | Hugging Face | Multilingual Analysis | Sentiment Modeling | Data Visualization |

Analyzed 500K+ multilingual social media posts (English & Spanish) to uncover public perceptions of e-cigarettes using BERTopic and RoBERTa, identifying cross-lingual sentiment shifts and key discussion themes.


🔗 Cross-Cultural NLP Analysis of Luxury Hotel Reviews in Europe — LDA Topic Modeling

| LDA | Topic Modeling | NLP | Sentiment Analysis | Cross-Cultural Analytics | Python | Visualization |

Explored European luxury hotel reviews using LDA topic modeling to uncover country-specific satisfaction drivers, linking linguistic tone to cultural preferences and customer experience.


🔗 Quality of Life Analysis: Tri-State Visualization (Report PDF)
| Data Visualization | Tableau | Statistics | Socioeconomic Indicators | Public Data Analysis |

Developed an interactive dashboard comparing education, income, housing, and healthcare metrics across New York, New Jersey, and Connecticut to evaluate regional quality of life patterns.


🔗 Multidimensional Analysis of Video Game Sales and Global Market Trends (Report PDF)
| Statistics | Regression Analysis | Market Analytics | Data Visualization | Exploratory Data Analysis |

Performed multivariate statistical modeling on global video game sales to uncover genre, platform, and regional trends, providing insight into market dynamics and sales forecasting.


COVID-19 Awareness & Prediction Analysis through ML & DL

| Machine Learning | Deep Learning | Time Series Forecasting | Data Visualization | Public Health Analytics |

Developed ML/DL models to predict COVID-19 trends in Ohio using feature engineering and time-series modeling, revealing key behavioral and social factors influencing public awareness.


Resume and Other Docs

🙌 Most Updated Resume Here


Email Me

Wonha Shin / leahnote01@gmail.com 📩 Email me!