Wonha (Leah) Shin

Logo


Machine Learning & MLOps Engineer

I design, build, and scale intelligent systems β€” from streaming NLP pipelines to real-time LLM applications. Driven by curiosity, I combine data engineering with AI research to turn ideas into production-ready systems.

View My LinkedIn Profile

View My GitHub Profile

πŸ’¬ Cross-Linguistic NLP Analysis of E-Cigarette Perceptions on Social Media

Python | Hugging Face | BERTweet | Twitter Twin BERT | Sentence-BERT | UMAP | K-Means | BERTopic | Multilingual NLP | Public Health AI


🌍 Overview

Led a multilingual NLP capstone project analyzing 51K English and 7K Spanish tweets (sampled from 1.1M+ total) to uncover cultural and linguistic differences in public e-cigarette discourse.
The project integrated Transformer-based models and BERTopic clustering to reveal how cultural and linguistic nuances shape online conversations about vaping and health.

Goal: Bridge the gap between multilingual social media analysis and public health strategy by uncovering culturally specific narratives in e-cigarette discussions.


🧩 Dataset & Preprocessing


🧠 Model Training & Fine-Tuning

Built and fine-tuned multiple Transformer-based classifiers to detect multilingual relevance, commercial intent, and attitude toward e-cigarettes.


πŸ” Topic Modeling Framework


πŸ“Š Key Findings


βš™οΈ Challenges


πŸš€ Future Work


πŸ“„ Full Report: E-Cigarette Perception Analysis (PDF)


πŸ“˜ Keywords:
Multilingual NLP | BERTweet | Sentence-BERT | BERTopic | Cross-Linguistic Analysis |
Transformer Models | Public Health | UMAP | Clustering | Cultural Interpretation