Model Extraction & API Abuse

Reconstruct ML models via API queries and test countermeasures like rate-limiting, noise, and watermarking.

Marcos Martín
Model Extraction & API Abuse

This project simulates model extraction attacks on ML-as-a-Service APIs, where adversaries query a deployed model and train a surrogate to approximate its decision boundary. It measures the fidelity of the extracted model compared to the original and evaluates defensive strategies such as rate-limiting, randomized outputs, and watermark verification.

Tech Stack

  • Python · FastAPI · PyTorch · Scikit-learn
  • Black-box query-based extraction algorithms
  • Logging, anomaly-based detection, and noise injection defenses

Example (Basic Extraction Loop)


import requests
import numpy as np
from sklearn.tree import DecisionTreeClassifier

# Query black-box API
X_train = np.random.rand(1000, 20)
y_labels = [requests.post("https://target.api/predict", json=x.tolist()).json()["label"] for x in X_train]

# Train local surrogate
surrogate = DecisionTreeClassifier().fit(X_train, y_labels)
print("Extracted surrogate model trained successfully!")

Project Highlights

  • Attack Simulation: Perform black-box extractions with adaptive query selection.
  • Defense Testing: Implement rate-limiting, response noise, and watermark-based model ownership checks.
  • Evaluation: Compare accuracy, confidence divergence, and fidelity metrics between original and surrogate models.
  • Visualization: Decision boundary plots and similarity heatmaps via Matplotlib and Streamlit.

Artifacts