API Reference¤
Core Modules¤
config
¤
Classes:
-
Config–Class for loading config files
Config
¤
Config(config_path: str | Path)
Class for loading config files
Parameters:
Attributes:
-
data(dict[str, Any]) –Get data config
-
lora(dict[str, Any] | None) –Get LoRA config
-
methods(list[dict[str, Any]]) –Get method configs
-
model(dict[str, Any]) –Get model config
-
sampling_parameters(dict[str, Any]) –Get sampling_parameters config
data_loader
¤
Classes:
-
DataLoader–Data loader class
DataLoader
¤
DataLoader(
data_path: str | Path | None = None,
data_format: str = "csv",
text_column: str = "text",
label_column: str = "label",
)
Data loader class
Parameters:
-
(data_path¤str | Path | None, default:None) –Path to the data (file or directory, or dataset name for huggingface format)
-
(data_format¤str, default:'csv') –Data format ("csv", "jsonl", "json", "parquet", "huggingface")
-
(text_column¤str, default:'text') –Name of the text column
-
(label_column¤str, default:'label') –Name of the label column
Methods:
-
get_data–Get data
-
load_mimir–Load Mimir dataset with fixed text length constraints
-
load_wikimia–Load WikiMIA dataset with specified text length
get_data
¤
load_mimir
staticmethod
¤
load_mimir(data_path: str, token: str) -> DataLoader
Load Mimir dataset with fixed text length constraints
Parameters:
-
(data_path¤str) –Path to the data (dataset name for huggingface format)
-
(token¤str) –Hugging Face token
Returns:
-
DataLoader–DataLoader instance
load_wikimia
staticmethod
¤
load_wikimia(text_length: int) -> DataLoader
Load WikiMIA dataset with specified text length
Parameters:
Returns:
-
DataLoader–DataLoader instance
evaluator
¤
Classes:
-
EvaluationResult–Container for evaluation results with detailed information
-
Evaluator–Evaluator for membership inference attacks
EvaluationResult
dataclass
¤
EvaluationResult(
results_df: DataFrame,
detailed_results: list[dict[str, Any]],
labels: list[int],
data_stats: dict[str, Any],
cache_stats: dict[str, Any] = dict(),
)
Container for evaluation results with detailed information
Evaluator
¤
Evaluator(
data_loader: DataLoader,
model_loader: ModelLoader,
methods: list[BaseMethod],
max_cache_size: int = 1000,
)
Evaluator for membership inference attacks
Parameters:
-
(data_loader¤DataLoader) –Data loader
-
(model_loader¤ModelLoader) –Model loader
-
(methods¤list[BaseMethod]) –List of methods to use for evaluation
-
(max_cache_size¤int, default:1000) –Maximum cache size
Methods:
-
evaluate–Evaluate membership inference attacks on data with specified number of words
evaluate
¤
evaluate(config: Config) -> EvaluationResult
Evaluate membership inference attacks on data with specified number of words
Parameters:
-
(config¤Config) –Configuration
Returns:
-
EvaluationResult–EvaluationResult containing DataFrame, detailed results, labels, and stats
model_loader
¤
Classes:
-
ModelLoader–vLLM model loader class
ModelLoader
¤
ModelLoader(model_config: dict[str, Any])
vLLM model loader class
Parameters:
Methods:
-
get_lora_request–Get LoRA request
-
get_sampling_params–Get sampling parameters
get_lora_request
¤
get_lora_request(lora_config: dict[str, Any]) -> LoRARequest
utils
¤
Functions:
-
fix_seed–Fix random seed
-
get_metrics–Calculate evaluation metrics
MIA Methods¤
methods
¤
Modules:
-
base– -
conrecall– -
dcpdd– -
factory– -
loss– -
lower– -
mink– -
pac– -
prefix_utils– -
recall– -
ref– -
samia– -
zlib–
Classes:
-
BaseMethod–Base class for membership inference methods
-
CONReCaLLMethod–Con-ReCall membership inference method
-
DCPDDMethod–DC-PDD membership inference method
-
LossMethod–Loss (log-likelihood) based membership inference method
-
LowerMethod–Lower based membership inference method
-
MethodFactory–Method factory class
-
MinKMethod–Min-K% Prob based membership inference method
-
PACMethod–PAC (Polarized Augment Calibration) based membership inference method
-
ReCaLLMethod–ReCaLL membership inference method
-
RefMethod–Reference model based membership inference method
-
SaMIAMethod–SaMIA membership inference method
-
ZlibMethod–Zlib compression-based membership inference method
BaseMethod
¤
BaseMethod(method_name: str, method_config: dict[str, Any] = None)
flowchart TD
methods.BaseMethod[BaseMethod]
click methods.BaseMethod href "" "methods.BaseMethod"
Base class for membership inference methods
Parameters:
-
(method_name¤str) –Name of the method
-
(method_config¤dict[str, Any], default:None) –Method configuration
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate score
-
run–Run inference for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
abstractmethod
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Run inference for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
CONReCaLLMethod
¤
CONReCaLLMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.CONReCaLLMethod[CONReCaLLMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.CONReCaLLMethod
click methods.CONReCaLLMethod href "" "methods.CONReCaLLMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Con-ReCall membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate loss (negative log-likelihood)
-
run–CON-ReCaLL algorithm to calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
process_output(output: RequestOutput, prefix_token_length: int) -> float
DCPDDMethod
¤
DCPDDMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.DCPDDMethod[DCPDDMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.DCPDDMethod
click methods.DCPDDMethod href "" "methods.DCPDDMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
DC-PDD membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate DC-PDD score
-
run–DC-PDD algorithm to calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
LossMethod
¤
LossMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.LossMethod[LossMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.LossMethod
click methods.LossMethod href "" "methods.LossMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Loss (log-likelihood) based membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate loss
-
run–Run inference for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Run inference for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
LowerMethod
¤
LowerMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.LowerMethod[LowerMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.LowerMethod
click methods.LowerMethod href "" "methods.LowerMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Lower based membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate loss
-
run–Run Lower algorithm and calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Run Lower algorithm and calculate scores for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
MethodFactory
¤
Method factory class
Methods:
-
create_method–Create a method
-
create_methods–Create multiple methods
create_method
staticmethod
¤
create_method(method_config: dict[str, Any]) -> BaseMethod
Create a method
Parameters:
-
(method_config¤dict[str, Any]) –Method configuration - type: Type of method ('loss', 'lower', 'zlib', 'mink', 'pac', 'recall', 'conrecall', 'samia', 'dcpdd', 'ref') - params: Method-specific parameters
Returns:
-
BaseMethod–Created method
Raises:
-
ValueError–If unknown method type is specified
create_methods
staticmethod
¤
create_methods(methods_config: list[dict[str, Any]]) -> list[BaseMethod]
MinKMethod
¤
MinKMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.MinKMethod[MinKMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.MinKMethod
click methods.MinKMethod href "" "methods.MinKMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Min-K% Prob based membership inference method
Parameters:
-
(method_config¤dict[str, Any], default:None) –Method configuration - ratio: Ratio of lowest probability tokens to use (0.0-1.0)
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate Min-K% score
-
run–Run inference for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Run inference for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
PACMethod
¤
PACMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.PACMethod[PACMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.PACMethod
click methods.PACMethod href "" "methods.PACMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
PAC (Polarized Augment Calibration) based membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate Polarized Distance
-
run–PAC algorithm to calculate scores for a list of texts.
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
ReCaLLMethod
¤
ReCaLLMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.ReCaLLMethod[ReCaLLMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.ReCaLLMethod
click methods.ReCaLLMethod href "" "methods.ReCaLLMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
ReCaLL membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate loss (negative log-likelihood)
-
run–ReCaLL algorithm to calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
process_output(output: RequestOutput, prefix_token_length: int) -> float
RefMethod
¤
flowchart TD
methods.RefMethod[RefMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.RefMethod
click methods.RefMethod href "" "methods.RefMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Reference model based membership inference method
method_config: Method configuration
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate loss
-
run–Ref algorithm to calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Ref algorithm to calculate scores for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model (target)
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns: List of Ref scores
SaMIAMethod
¤
SaMIAMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.SaMIAMethod[SaMIAMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.SaMIAMethod
click methods.SaMIAMethod href "" "methods.SaMIAMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
SaMIA membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Calculate SaMIA score from a single model output
-
run–SaMIA algorithm to calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
Calculate SaMIA score from a single model output Note: This method is called from BaseMethod.run, but for SaMIA, a custom implementation using multiple samples is used, so this method is not supported for single output. Use run method instead.
Parameters:
-
(output¤RequestOutput) –Model output
Returns:
-
float–SaMIA score
ZlibMethod
¤
ZlibMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.ZlibMethod[ZlibMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.ZlibMethod
click methods.ZlibMethod href "" "methods.ZlibMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Zlib compression-based membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate zlib-compressed information content ratio
-
run–Run inference for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Run inference for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
base
¤
Classes:
-
BaseMethod–Base class for membership inference methods
BaseMethod
¤
BaseMethod(method_name: str, method_config: dict[str, Any] = None)
flowchart TD
methods.base.BaseMethod[BaseMethod]
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Base class for membership inference methods
Parameters:
-
(method_name¤str) –Name of the method
-
(method_config¤dict[str, Any], default:None) –Method configuration
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate score
-
run–Run inference for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
abstractmethod
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Run inference for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
conrecall
¤
Classes:
-
CONReCaLLMethod–Con-ReCall membership inference method
CONReCaLLMethod
¤
CONReCaLLMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.conrecall.CONReCaLLMethod[CONReCaLLMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.conrecall.CONReCaLLMethod
click methods.conrecall.CONReCaLLMethod href "" "methods.conrecall.CONReCaLLMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Con-ReCall membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate loss (negative log-likelihood)
-
run–CON-ReCaLL algorithm to calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
process_output(output: RequestOutput, prefix_token_length: int) -> float
dcpdd
¤
Classes:
-
DCPDDMethod–DC-PDD membership inference method
DCPDDMethod
¤
DCPDDMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.dcpdd.DCPDDMethod[DCPDDMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.dcpdd.DCPDDMethod
click methods.dcpdd.DCPDDMethod href "" "methods.dcpdd.DCPDDMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
DC-PDD membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate DC-PDD score
-
run–DC-PDD algorithm to calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
factory
¤
Classes:
-
MethodFactory–Method factory class
MethodFactory
¤
Method factory class
Methods:
-
create_method–Create a method
-
create_methods–Create multiple methods
create_method
staticmethod
¤
create_method(method_config: dict[str, Any]) -> BaseMethod
Create a method
Parameters:
-
(method_config¤dict[str, Any]) –Method configuration - type: Type of method ('loss', 'lower', 'zlib', 'mink', 'pac', 'recall', 'conrecall', 'samia', 'dcpdd', 'ref') - params: Method-specific parameters
Returns:
-
BaseMethod–Created method
Raises:
-
ValueError–If unknown method type is specified
create_methods
staticmethod
¤
create_methods(methods_config: list[dict[str, Any]]) -> list[BaseMethod]
loss
¤
Classes:
-
LossMethod–Loss (log-likelihood) based membership inference method
LossMethod
¤
LossMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.loss.LossMethod[LossMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.loss.LossMethod
click methods.loss.LossMethod href "" "methods.loss.LossMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Loss (log-likelihood) based membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate loss
-
run–Run inference for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Run inference for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
lower
¤
Classes:
-
LowerMethod–Lower based membership inference method
LowerMethod
¤
LowerMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.lower.LowerMethod[LowerMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.lower.LowerMethod
click methods.lower.LowerMethod href "" "methods.lower.LowerMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Lower based membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate loss
-
run–Run Lower algorithm and calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Run Lower algorithm and calculate scores for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
mink
¤
Classes:
-
MinKMethod–Min-K% Prob based membership inference method
MinKMethod
¤
MinKMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.mink.MinKMethod[MinKMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.mink.MinKMethod
click methods.mink.MinKMethod href "" "methods.mink.MinKMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Min-K% Prob based membership inference method
Parameters:
-
(method_config¤dict[str, Any], default:None) –Method configuration - ratio: Ratio of lowest probability tokens to use (0.0-1.0)
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate Min-K% score
-
run–Run inference for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Run inference for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
pac
¤
Classes:
-
PACMethod–PAC (Polarized Augment Calibration) based membership inference method
PACMethod
¤
PACMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.pac.PACMethod[PACMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.pac.PACMethod
click methods.pac.PACMethod href "" "methods.pac.PACMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
PAC (Polarized Augment Calibration) based membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate Polarized Distance
-
run–PAC algorithm to calculate scores for a list of texts.
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
prefix_utils
¤
Functions:
-
compute_prefix_loss–Compute negative mean log-likelihood excluding prefix tokens.
-
extract_prefix–Randomly select num_shots texts from the list without modifying the original.
-
process_prefix–Process prefix to fit within model's max length.
compute_prefix_loss
¤
compute_prefix_loss(output: RequestOutput, prefix_token_length: int) -> float
extract_prefix
¤
process_prefix
¤
process_prefix(
model: LLM,
tokenizer: AnyTokenizer,
prefix: list[str],
avg_length: int,
pass_window: bool,
num_shots: int,
) -> tuple[list[str], int]
Process prefix to fit within model's max length.
Parameters:
-
(model¤LLM) –LLM model
-
(tokenizer¤AnyTokenizer) –Tokenizer
-
(prefix¤list[str]) –List of prefix texts
-
(avg_length¤int) –Average token length of texts
-
(pass_window¤bool) –If True, skip window check
-
(num_shots¤int) –Number of shots
Returns:
recall
¤
Classes:
-
ReCaLLMethod–ReCaLL membership inference method
ReCaLLMethod
¤
ReCaLLMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.recall.ReCaLLMethod[ReCaLLMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.recall.ReCaLLMethod
click methods.recall.ReCaLLMethod href "" "methods.recall.ReCaLLMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
ReCaLL membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate loss (negative log-likelihood)
-
run–ReCaLL algorithm to calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
process_output(output: RequestOutput, prefix_token_length: int) -> float
ref
¤
Classes:
-
RefMethod–Reference model based membership inference method
RefMethod
¤
flowchart TD
methods.ref.RefMethod[RefMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.ref.RefMethod
click methods.ref.RefMethod href "" "methods.ref.RefMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Reference model based membership inference method
method_config: Method configuration
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate loss
-
run–Ref algorithm to calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Ref algorithm to calculate scores for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model (target)
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns: List of Ref scores
samia
¤
Classes:
-
SaMIAMethod–SaMIA membership inference method
Functions:
-
get_suffix–Extracts a suffix from the given text, based on the specified prefix ratio and text length.
-
ngrams–Generates n-grams from a sequence.
-
rouge_n–Calculates the ROUGE-N score between a candidate and a reference.
SaMIAMethod
¤
SaMIAMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.samia.SaMIAMethod[SaMIAMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.samia.SaMIAMethod
click methods.samia.SaMIAMethod href "" "methods.samia.SaMIAMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
SaMIA membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Calculate SaMIA score from a single model output
-
run–SaMIA algorithm to calculate scores for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
Calculate SaMIA score from a single model output Note: This method is called from BaseMethod.run, but for SaMIA, a custom implementation using multiple samples is used, so this method is not supported for single output. Use run method instead.
Parameters:
-
(output¤RequestOutput) –Model output
Returns:
-
float–SaMIA score
get_suffix
¤
Extracts a suffix from the given text, based on the specified prefix ratio and text length.
zlib
¤
Classes:
-
ZlibMethod–Zlib compression-based membership inference method
ZlibMethod
¤
ZlibMethod(method_config: dict[str, Any] = None)
flowchart TD
methods.zlib.ZlibMethod[ZlibMethod]
methods.base.BaseMethod[BaseMethod]
methods.base.BaseMethod --> methods.zlib.ZlibMethod
click methods.zlib.ZlibMethod href "" "methods.zlib.ZlibMethod"
click methods.base.BaseMethod href "" "methods.base.BaseMethod"
Zlib compression-based membership inference method
Parameters:
Methods:
-
cleanup_model–Release GPU memory used by a vLLM model
-
clear_cache–Clear cache and reset statistics
-
get_cache_stats–Get cache statistics
-
get_outputs–Get model outputs for a list of texts
-
process_output–Process model output and calculate zlib-compressed information content ratio
-
run–Run inference for a list of texts
-
set_max_cache_size–Set maximum cache size
get_cache_stats
classmethod
¤
get_outputs
¤
get_outputs(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[RequestOutput]
Get model outputs for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns:
-
list[RequestOutput]–List of model outputs
process_output
¤
run
¤
run(
texts: list[str],
model: LLM,
sampling_params: SamplingParams,
lora_request: LoRARequest = None,
data_config: dict[str, Any] = None,
) -> list[float]
Run inference for a list of texts
Parameters:
-
(texts¤list[str]) –List of texts
-
(model¤LLM) –LLM model
-
(sampling_params¤SamplingParams) –Sampling parameters
-
(lora_request¤LoRARequest, default:None) –LoRA request
-
(data_config¤dict[str, Any], default:None) –Data configuration
Returns: