Leonardo Siusystems + backend

NLP Bias Detection

An end‑to‑end NLP system that detects demographic bias in large social datasets by preprocessing tens of thousands of samples and fine‑tuning transformer models for reliable, actionable signals.

RoleML Engineer
TimelineAug 2025 - Present
DomainNLP / AI

What it is

This project investigates how bias present in real-world text datasets influences language model behavior. Using the RedditBias dataset, we built a pipeline that cleans noisy social data, engineers bias-aware features, and fine-tunes a transformer-based classifier (BERT) to identify biased content. The goal was not just model performance, but understanding where bias emerges, how it manifests, and how reliably it can be detected.

My Role

I led data preprocessing and exploration to make the dataset usable and trustworthy: addressing class imbalance, cleaning and standardizing text, engineering features, and handling unique characters via regex. I also instrumented runs with Weights & Biases to handle hyperparameter tuning and model tracking. This involved tweaking hyperparameters like learning rate, batch size, learning rate scheduler, weight decay, while adding dropout for early stopping. The tuning was pivotal: the fine‑tuned model initially overfit with validation loss >150% despite good precision/recall/F1/accuracy. Through disciplined tuning and run tracking, we reduced validation loss to ~45% while maintaining or improving core metrics. I additionally helped implement a lightweight inference service so users could interact with the model via the project website.

Interesting Constraints

  • 01Real-world data noise: Reddit comments contained inconsistent formatting, ambiguous labels, and missing values.
  • 02Moderate class imbalance: More biased samples than non-biased, requiring careful evaluation.
  • 03Overfitting risk: Early training runs showed divergence between training and validation loss.

What I Learned

  • Data quality drives outcomes: robust cleaning, label hygiene, and stratified splits matter as much as model choice.
  • Tune for generalization: hyperparameters and regularization are levers to reduce validation loss without sacrificing core metrics.
  • Ship the interface: a lightweight inference service with clear contracts turns models into usable products.

Tech Stack

PythonPyTorchscikit-learnpandasnumpymatplotlibseabornTransformersWeights & BiasesVercelDockerFastAPINode.jsExpress.js