ML Model Poisoning Simulation

Simulate data poisoning and evaluate its impact on accuracy and targeted misclassifications.

Marcos Martín
ML Model Poisoning Simulation

This project explores data poisoning attacks in supervised and deep learning pipelines. It demonstrates how injecting crafted samples into training data can bias model behavior or embed targeted backdoors. Experiments measure accuracy degradation, misclassification rate, and resilience of different defensive mechanisms.

Tech Stack

  • Python · PyTorch · Scikit-learn · Pandas
  • Poisoning scenarios with targeted & untargeted backdoors
  • Defense experiments: anomaly detection & data sanitization

Example (Label-flip attack)


import numpy as np

def label_flip(y, flip_ratio=0.1):
    y_poisoned = y.copy()
    n_flip = int(len(y) * flip_ratio)
    idx = np.random.choice(len(y), n_flip, replace=False)
    y_poisoned[idx] = 1 - y_poisoned[idx]  # flip 0↔1
    return y_poisoned

Project Highlights

  • Attack Scenarios: Label-flip, backdoor trigger injection, gradient-based poisoning.
  • Impact Evaluation: Track accuracy drop and F1-score variance across multiple runs.
  • Defense Techniques: Data sanitization filters, robust aggregation, and anomaly-based detection.
  • Visualization: Interactive charts for poisoned vs. clean decision boundaries.

Artifacts