Adversarial Image Attack Lab

Generating and defending adversarial examples for image classifiers (FGSM, PGD, DeepFool).

Marcos Martín
Adversarial Image Attack Lab

This lab explores how small, targeted perturbations can cause convolutional neural networks to misclassify images. We implement common attack algorithms (FGSM, PGD, DeepFool), evaluate model degradation, and test defenses such as adversarial training and input preprocessing.

Tech Stack

  • Python · PyTorch · torchvision
  • Streamlit · Jupyter Notebooks
  • Docker · GitHub Actions CI

Project Overview

  • Offense: generate adversarial images with FGSM, PGD and DeepFool.
  • Defense: apply adversarial training and preprocessing filters.
  • Visualization: Streamlit UI comparing original vs adversarial outputs.
  • Reproducibility: Dockerfile and CI workflow for portability.

Process & Flow

High-level data flow inside the lab:

Flow diagram

Example (FGSM attack – PyTorch)


import torch

def fgsm_attack(model, images, labels, epsilon):
    images = images.clone().detach().requires_grad_(True)
    outputs = model(images)
    loss = torch.nn.functional.cross_entropy(outputs, labels)
    model.zero_grad()
    loss.backward()
    perturbed = images + epsilon * images.grad.sign()
    return torch.clamp(perturbed, 0, 1)

Repository Structure

attacks/ – Adversarial methods
  • fgsm.py – Fast Gradient Sign Method
  • pgd.py – Projected Gradient Descent
  • deepfool.py – Decision-boundary attack
defenses/ – Mitigation strategies
  • adversarial_training.py – Train with adversarial examples
  • preprocessing.py – JPEG compression and denoising
app/ – Streamlit Interface
  • app.py – Main UI logic and controls
  • components/ – Reusable widgets and charts

Artifacts & Links