Fast-MIA¤

Fast-MIA is a framework for efficiently evaluating Membership Inference Attacks (MIA) against Large Language Models (LLMs). This tool enables fast execution of representative membership inference methods using vLLM.

✨ Features¤

🚀 Reduced Execution Time: Efficiently runs multiple inference methods using vLLM and result caching while preserving evaluation accuracy.
📊 Cross-Method Evaluation: Compare and evaluate methods (LOSS, PPL/zlib, Min-K% Prob, etc.) under the same conditions.
🔧 Flexibility & Extensibility: Easily change models, datasets, evaluation methods, and parameters using YAML configuration files.
🎯 Multiple Data Formats: Supports CSV, JSON, JSONL, Parquet, and Hugging Face Datasets.

🚀 Quick Start¤

Environment¤

Supported environments are Linux & NVIDIA GPUs. It basically supports the same GPU requirements as vLLM. For example, it takes a few minutes to run using NVIDIA A100 80GB.

Installation¤

# install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
# clone repository
git clone https://github.com/Nikkei/fast-mia.git
# install dependencies
cd fast-mia
uv sync
source .venv/bin/activate

Execution¤

uv run --with 'vllm==0.15.1' python main.py --config config/sample.yaml

Note: When using T4 GPUs (e.g., Google Colab), set the environment variable to avoid attention backend issues:

VLLM_ATTENTION_BACKEND=XFORMERS uv run --with 'vllm==0.15.1' python main.py --config config/sample.yaml

Detailed Report Mode¤

For benchmarking with detailed outputs (metadata, per-sample scores, visualizations):

uv run --with 'vllm==0.15.1' python main.py --config config/sample.yaml --detailed-report

📚 Supported MIA Methods¤

Fast-MIA supports the following MIA methods:

Type	Method Name (identifier)	Description
Baseline	LOSS (`loss`)	Uses the model's loss
	PPL/zlib (`zlib`)	Uses the ratio of information content calculated by Zlib compression
	Ref (`ref`)	Uses the difference in loss between the target model and a reference model
Token distribution	Min-K% Prob (`mink`)	https://github.com/swj0419/detect-pretrain-code
	DC-PDD (`dcpdd`)	https://github.com/zhang-wei-chao/DC-PDD
Text alternation	Lowercase (`lower`)	Uses the ratio of loss after lowercasing the text
	PAC (`pac`)	https://github.com/yyy01/PAC
	ReCaLL (`recall`)	https://github.com/ruoyuxie/recall
	Con-ReCall (`conrecall`)	https://github.com/WangCheng0116/CON-RECALL
Black-box	SaMIA (`samia`)	https://github.com/nlp-titech/samia

📖 Documentation¤

API Reference - Detailed API documentation

🔗 Links¤

📝 License¤

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

📑 Reference¤

@misc{takahashi_ishihara_fastmia,
  Author = {Hiromu Takahashi and Shotaro Ishihara},
  Title = {{Fast-MIA}: Efficient and Scalable Membership Inference for LLMs},
  Year = {2025},
  Eprint = {arXiv:2510.23074},
  URL = {https://arxiv.org/abs/2510.23074}
}