AI-based Malware Detection

Classify binaries and network traces using ML models with adversarial robustness evaluation.

Marcos Martín
AI-based Malware Detection

This lab explores the use of machine learning for malware classification across both static and dynamic analysis pipelines. It implements binary feature extraction, training of gradient-boosted and deep models, and adversarial evasion testing to assess robustness against obfuscation and polymorphic malware.

Tech Stack

  • Python · Scikit-learn · XGBoost · PyTorch
  • Static & dynamic feature extraction (PE headers, API calls, opcodes)
  • Adversarial evasion tests and feature-importance explainability

Example (XGBoost feature importances)


from xgboost import XGBClassifier
import shap

model = XGBClassifier().fit(X_train, y_train)
explainer = shap.Explainer(model)
shap_values = explainer(X_test)
shap.summary_plot(shap_values, X_test)

Project Highlights

  • Static Analysis: Extract PE header, string entropy, and opcode frequency features.
  • Dynamic Analysis: Simulated runtime behaviors and API call sequences.
  • Adversarial Evaluation: Generate evasion examples using feature mutation and adversarial embedding perturbations.
  • Visualization: SHAP-based model explainability and confusion matrix dashboards.

Artifacts