Adversarial Image Attack Lab
Generating and defending adversarial examples for image classifiers (FGSM, PGD, DeepFool).
This lab explores how small, targeted perturbations can cause convolutional neural networks to misclassify images. We implement common attack algorithms (FGSM, PGD, DeepFool), evaluate model degradation, and test defenses such as adversarial training and input preprocessing.
Tech Stack
- Python · PyTorch · torchvision
- Streamlit · Jupyter Notebooks
- Docker · GitHub Actions CI
Project Overview
- Offense: generate adversarial images with FGSM, PGD and DeepFool.
- Defense: apply adversarial training and preprocessing filters.
- Visualization: Streamlit UI comparing original vs adversarial outputs.
- Reproducibility: Dockerfile and CI workflow for portability.
Process & Flow
High-level data flow inside the lab:
Example (FGSM attack – PyTorch)
import torch
def fgsm_attack(model, images, labels, epsilon):
images = images.clone().detach().requires_grad_(True)
outputs = model(images)
loss = torch.nn.functional.cross_entropy(outputs, labels)
model.zero_grad()
loss.backward()
perturbed = images + epsilon * images.grad.sign()
return torch.clamp(perturbed, 0, 1)
Repository Structure
attacks/ – Adversarial methods
fgsm.py– Fast Gradient Sign Methodpgd.py– Projected Gradient Descentdeepfool.py– Decision-boundary attack
defenses/ – Mitigation strategies
adversarial_training.py– Train with adversarial examplespreprocessing.py– JPEG compression and denoising
app/ – Streamlit Interface
app.py– Main UI logic and controlscomponents/– Reusable widgets and charts
