Shortcuts

slideflow.mil

This submodule contains tools for multiple-instance learning (MIL) model training and evaluation. See Multiple-Instance Learning (MIL) for more information.

Main functions

mil_config(model: str | Callable, trainer: str = 'fastai', **kwargs)[source]

Create a multiple-instance learning (MIL) training configuration.

All models by default are trained with the FastAI trainer. However, CLAM models can also be trained using the original, legacy CLAM trainer. This deprecated trainer has been kept for backwards compatibility; the FastAI trainer is preferred to all models, including CLAM.

Parameters:
  • model (str, Callable) – Either the name of a model, or a custom torch module. Valid model names include "clam_sb", "clam_mb", "mil_fc", "mil_fc_mc", "attention_mil", and "transmil".

  • trainer (str) – Type of MIL trainer to use. Either ‘fastai’ or ‘clam’. All models (including CLAM) can be trained with ‘fastai’. The deprecated, legacy ‘clam’ trainer is only available for CLAM models, and has been kept for backwards compatibility. Defaults to ‘fastai’ (preferred).

  • **kwargs – All additional keyword arguments are passed to either slideflow.mil.TrainerConfigCLAM for CLAM models, or slideflow.mil.TrainerConfigFastAI for all other models.

train_mil(config: _TrainerConfig, train_dataset: Dataset, val_dataset: Dataset | None, outcomes: str | List[str], bags: str | List[str], *, outdir: str = 'mil', exp_label: str | None = None, **kwargs)[source]

Train a multiple-instance learning (MIL) model.

Parameters:
  • config (slideflow.mil.TrainerConfigFastAI or slideflow.mil.TrainerConfigCLAM) – Trainer and model configuration.

  • train_dataset (slideflow.Dataset) – Training dataset.

  • val_dataset (slideflow.Dataset) – Validation dataset.

  • outcomes (str) – Outcome column (annotation header) from which to derive category labels.

  • bags (str) – Either a path to directory with *.pt files, or a list of paths to individual *.pt files. Each file should contain exported feature vectors, with each file containing all tile features for one patient.

Keyword Arguments:
  • outdir (str) – Directory in which to save model and results.

  • exp_label (str) – Experiment label, used for naming the subdirectory in the {project root}/mil folder, where training history and the model will be saved.

  • attention_heatmaps (bool) – Generate attention heatmaps for slides. Not available for multi-modal MIL models. Defaults to False.

  • interpolation (str, optional) – Interpolation strategy for smoothing attention heatmaps. Defaults to ‘bicubic’.

  • cmap (str, optional) – Matplotlib colormap for heatmap. Can be any valid matplotlib colormap. Defaults to ‘inferno’.

  • norm (str, optional) – Normalization strategy for assigning heatmap values to colors. Either ‘two_slope’, or any other valid value for the norm argument of matplotlib.pyplot.imshow. If ‘two_slope’, normalizes values less than 0 and greater than 0 separately. Defaults to None.

train_clam(config: TrainerConfigCLAM, train_dataset: Dataset, val_dataset: Dataset, outcomes: str | List[str], bags: str | List[str], *, outdir: str = 'mil', attention_heatmaps: bool = False, **heatmap_kwargs) None[source]

Train a CLAM model from layer activations exported with slideflow.project.generate_features_for_clam().

See Multiple-Instance Learning (MIL) for more information.

Parameters:
  • train_dataset (slideflow.Dataset) – Training dataset.

  • val_dataset (slideflow.Dataset) – Validation dataset.

  • outcomes (str) – Outcome column (annotation header) from which to derive category labels.

  • bags (str) – Either a path to directory with *.pt files, or a list of paths to individual *.pt files. Each file should contain exported feature vectors, with each file containing all tile features for one patient.

Keyword Arguments:
  • outdir (str) – Directory in which to save model and results.

  • exp_label (str) – Experiment label, used for naming the subdirectory in the outdir folder, where training history and the model will be saved.

  • clam_args (optional) – Namespace with clam arguments, as provided by slideflow.clam.get_args().

  • attention_heatmaps (bool) – Generate attention heatmaps for slides. Defaults to False.

  • interpolation (str, optional) – Interpolation strategy for smoothing attention heatmaps. Defaults to ‘bicubic’.

  • cmap (str, optional) – Matplotlib colormap for heatmap. Can be any valid matplotlib colormap. Defaults to ‘inferno’.

  • norm (str, optional) – Normalization strategy for assigning heatmap values to colors. Either ‘two_slope’, or any other valid value for the norm argument of matplotlib.pyplot.imshow. If ‘two_slope’, normalizes values less than 0 and greater than 0 separately. Defaults to None.

Returns:

None

train_fastai(config: TrainerConfigFastAI, train_dataset: Dataset, val_dataset: Dataset, outcomes: str | List[str], bags: str | List[str], *, outdir: str = 'mil', attention_heatmaps: bool = False, uq: bool = False, **heatmap_kwargs) None[source]

Train an aMIL model using FastAI.

Parameters:
  • train_dataset (slideflow.Dataset) – Training dataset.

  • val_dataset (slideflow.Dataset) – Validation dataset.

  • outcomes (str) – Outcome column (annotation header) from which to derive category labels.

  • bags (str) – Either a path to directory with *.pt files, or a list of paths to individual *.pt files. Each file should contain exported feature vectors, with each file containing all tile features for one patient.

Keyword Arguments:
  • outdir (str) – Directory in which to save model and results.

  • exp_label (str) – Experiment label, used for naming the subdirectory in the {project root}/mil folder, where training history and the model will be saved.

  • lr (float) – Learning rate, or maximum learning rate if fit_one_cycle=True.

  • epochs (int) – Maximum epochs.

  • attention_heatmaps (bool) – Generate attention heatmaps for slides. Defaults to False.

  • interpolation (str, optional) – Interpolation strategy for smoothing attention heatmaps. Defaults to ‘bicubic’.

  • cmap (str, optional) – Matplotlib colormap for heatmap. Can be any valid matplotlib colormap. Defaults to ‘inferno’.

  • norm (str, optional) – Normalization strategy for assigning heatmap values to colors. Either ‘two_slope’, or any other valid value for the norm argument of matplotlib.pyplot.imshow. If ‘two_slope’, normalizes values less than 0 and greater than 0 separately. Defaults to None.

Returns:

fastai.learner.Learner

build_fastai_learner(config: TrainerConfigFastAI, train_dataset: Dataset, val_dataset: Dataset, outcomes: str | List[str], bags: str | ndarray | List[str], *, outdir: str = 'mil', return_shape: bool = False) Learner[source]

Build a FastAI Learner for training an aMIL model.

Parameters:
  • train_dataset (slideflow.Dataset) – Training dataset.

  • val_dataset (slideflow.Dataset) – Validation dataset.

  • outcomes (str) – Outcome column (annotation header) from which to derive category labels.

  • bags (str) – list of paths to individual *.pt files. Each file should contain exported feature vectors, with each file containing all tile features for one patient.

Keyword Arguments:
  • outdir (str) – Directory in which to save model and results.

  • exp_label (str) – Experiment label, used for naming the subdirectory in the outdir folder, where training history and the model will be saved.

  • lr (float) – Learning rate, or maximum learning rate if fit_one_cycle=True.

  • epochs (int) – Maximum epochs.

Returns:

fastai.learner.Learner

eval_mil(weights: str, dataset: Dataset, outcomes: str | List[str], bags: str | List[str], config: _TrainerConfig | None = None, *, outdir: str = 'mil', attention_heatmaps: bool = False, uq: bool = False, aggregation_level: str | None = None, **heatmap_kwargs) DataFrame[source]

Evaluate a multiple-instance learning model.

Saves results for the evaluation in the target folder, including predictions (parquet format), attention (Numpy format for each slide), and attention heatmaps (if attention_heatmaps=True).

Logs classifier metrics (AUROC and AP) to the console.

Parameters:
  • weights (str) – Path to model weights to load.

  • dataset (sf.Dataset) – Dataset to evaluation.

  • outcomes (str, list(str)) – Outcomes.

  • bags (str, list(str)) – Path to bags, or list of bag file paths. Each bag should contain PyTorch array of features from all tiles in a slide, with the shape (n_tiles, n_features).

  • config (slideflow.mil.TrainerConfigFastAI or slideflow.mil.TrainerConfigCLAM) – Configuration for building model. If weights is a path to a model directory, will attempt to read mil_params.json from this location and load saved configuration. Defaults to None.

Keyword Arguments:
  • outdir (str) – Path at which to save results.

  • attention_heatmaps (bool) – Generate attention heatmaps for slides. Not available for multi-modal MIL models. Defaults to False.

  • interpolation (str, optional) – Interpolation strategy for smoothing attention heatmaps. Defaults to ‘bicubic’.

  • cmap (str, optional) – Matplotlib colormap for heatmap. Can be any valid matplotlib colormap. Defaults to ‘inferno’.

  • norm (str, optional) – Normalization strategy for assigning heatmap values to colors. Either ‘two_slope’, or any other valid value for the norm argument of matplotlib.pyplot.imshow. If ‘two_slope’, normalizes values less than 0 and greater than 0 separately. Defaults to None.

predict_slide(model: str, slide: str | WSI, extractor: BaseFeatureExtractor | None = None, *, normalizer: StainNormalizer | None = None, config: _TrainerConfig | None = None, attention: bool = False, native_normalizer: bool | None = True, extractor_kwargs: dict | None = None) Tuple[ndarray, ndarray | None][source]

Generate predictions (and attention) for a single slide.

Parameters:
  • model (str) – Path to MIL model.

  • slide (str) – Path to slide.

  • extractor (slideflow.mil.BaseFeatureExtractor, optional) –

    Feature extractor. If not provided, will attempt to auto-detect extractor from model.

    Note

    If the extractor has a stain normalizer, this will be used to normalize the slide before extracting features.

Keyword Arguments:
  • normalizer (slideflow.stain.StainNormalizer, optional) – Stain normalizer. If not provided, will attempt to use stain normalizer from extractor.

  • config (slideflow.mil.TrainerConfigFastAI or slideflow.mil.TrainerConfigCLAM) – Configuration for building model. If None, will attempt to read mil_params.json from the model directory and load saved configuration. Defaults to None.

  • attention (bool) – Whether to return attention scores. Defaults to False.

  • native_normalizer (bool, optional) – Whether to use PyTorch/Tensorflow-native stain normalization, if applicable. If False, will use the OpenCV/Numpy implementations. Defaults to None, which auto-detects based on the slide backend (False if libvips, True if cucim). This behavior is due to performance issued when using native stain normalization with libvips-compatible multiprocessing.

Returns:

Predictions and attention scores. Attention scores are None if attention is False, otherwise a masked 2D array with the same shape as the slide grid (arranged as a heatmap, with unused tiles masked).

Return type:

Tuple[np.ndarray, Optional[np.ndarray]]

TrainerConfigFastAI

class TrainerConfigFastAI(model: str | Callable = 'attention_mil', *, aggregation_level: str = 'slide', lr: float | None = None, wd: float = 1e-05, bag_size: int = 512, fit_one_cycle: bool = True, epochs: int = 32, batch_size: int = 64, drop_last: bool = True, save_monitor: str = 'valid_loss', **kwargs)[source]

Training configuration for FastAI MIL models.

This configuration should not be created directly, but rather should be created through slideflow.mil.mil_config(), which will create and prepare an appropriate trainer configuration.

Parameters:

model (str, Callable) – Either the name of a model, or a custom torch module. Valid model names include "clam_sb", "clam_mb", "mil_fc", "mil_fc_mc", "attention_mil", and "transmil".

Keyword Arguments:
  • aggregation_level (str) – When equal to 'slide' each bag contains tiles from a single slide. When equal to 'patient' tiles from all slides of a patient are grouped together.

  • lr (float, optional) – Learning rate. If fit_one_cycle=True, this is the maximum learning rate. If None, uses the Leslie Smith LR Range test to find an optimal learning rate. Defaults to None.

  • wd (float) – Weight decay. Only used if fit_one_cycle=False. Defaults to 1e-5.

  • bag_size (int) – Bag size. Defaults to 512.

  • fit_one_cycle (bool) – Use 1cycle learning rate schedule. Defaults to True.

  • epochs (int) – Maximum number of epochs. Defaults to 32.

  • batch_size (int) – Batch size. Defaults to 64.

  • **kwargs – All additional keyword arguments are passed to either slideflow.mil.ModelConfigCLAM for CLAM models, or slideflow.mil.ModelConfigFastAI for all other models.

slideflow.mil.TrainerConfigFastAI.model_fn

MIL model architecture (class/module).

slideflow.mil.TrainerConfigFastAI.loss_fn

MIL loss function.

to_dict(self)

Converts this training configuration to a dictionary.

json_dump(self)

Converts this training configuration to a JSON-compatible dict.

TrainerConfigCLAM

class TrainerConfigCLAM(*, num_splits: int = 1, k: int = 3, k_start: int = -1, k_end: int = -1, max_epochs: int = 20, lr: float = 0.0001, reg: float = 1e-05, label_frac: float = 1, weighted_sample: bool = False, log_data: bool = False, testing: bool = False, early_stopping: bool = False, subtyping: bool = False, seed: int = 1, results_dir: str | None = None, n_classes: int | None = None, split_dir=None, data_root_dir=None, micro_average=False, **kwargs)[source]

Training configuration for the legacy CLAM trainer.

This configures the legacy CLAM trainer. The FastAI trainer is preferred for all models, including CLAM.

The configuration options for the legacy CLAM trainer are identical to the options in the original CLAM paper.

Keyword Arguments:
  • k (int) – Number of cross-fold splits. Defaults to 3.

  • k_start (int) – Starting cross-fold. Defaults to first cross-fold.

  • k_end (int) – Ending cross-fold. Defaults to ending after last cross-fold is done.

  • max_epochs (int) – Number of epochs to train. Defaults to 20.

  • lr (float) – Learning rate. Defaults to 1e-4.

  • reg (float) – Weight decay. Defaults to 1e-5.

  • weighted_sample (bool) – Equally sample from all outcome classes. Defaults to False.

  • log_data (bool) – Log to tensorboard. Defaults to False.

  • early_stopping (bool) – Stop the training if validation loss doesn’t improve after 5 epochs. Will not trigger early stopping until epoch 50. Defaults to False.

  • subtyping (bool) – Whether this is a subtyping problem. Defaults to False.

  • seed (int) – Set the random seed. Defaults to 1.

  • n_classes (int) – Number of outcome classes. Defaults to None.

  • micro_average (bool) – Use micro averaging when calculate AUROC.

  • **kwargs – All additional keyword arguments are passed to slideflow.mil.ModelConfigCLAM.

slideflow.mil.TrainerConfigCLAM.model_fn

MIL model architecture (class/module).

slideflow.mil.TrainerConfigCLAM.loss_fn

MIL loss function.

to_dict(self)

Converts this training configuration to a dictionary.

json_dump(self)

Converts this training configuration to a JSON-compatible dict.

ModelConfigFastAI

class ModelConfigFastAI(model: str | Callable = 'attention_mil', *, use_lens: bool | None = None, apply_softmax: bool = True, model_kwargs: dict | None = None, validate: bool = True, **kwargs)[source]

Model configuration for a non-CLAM MIL model.

Parameters:

model (str, Callable) – Either the name of a model, or a custom torch module. Valid model names include "attention_mil" and "transmil". Defaults to ‘attention_mil’.

Keyword Arguments:

use_lens (bool, optional) – Whether the model expects a second argument to its .forward() function, an array with the bag size for each slide. If None, will default to True for 'attention_mil' models and False otherwise. Defaults to None.

ModelConfigCLAM

class ModelConfigCLAM(*, model: str = 'clam_sb', model_size: str = 'small', bag_loss: str = 'ce', bag_weight: float = 0.7, dropout: bool = False, opt: str = 'adam', inst_loss: str = 'ce', no_inst_cluster: bool = False, B: int = 8, model_kwargs: dict | None = None, validate: bool = True, **kwargs)[source]

Model configuration for CLAM models.

These configuration options are identical to the options in the original CLAM paper.

Keyword Arguments:
  • model (str) – Model. Either 'clam_sb', 'clam_mb', 'mil_fc', or 'mil_fc_mc'. Defaults to 'clam_sb'.

  • model_size (str) –

    Size of the model. Available sizes include:

    clam_sb

    small

    [1024, 512, 256]

    big

    [1024, 512, 384]

    multiscale

    [2048, 512, 256]

    xception

    [2048, 256, 128]

    xception_multi

    [1880, 128, 64]

    xception_3800

    [3800, 512, 256]

    clam_mb

    small

    [1024, 512, 256]

    big

    [1024, 512, 384]

    multiscale

    [2048, 512, 256]

    mil_fc

    small

    [1024, 512]

    mil_fc_mc

    small

    [1024, 512]

  • bag_loss (str) – Primary loss function. Either ‘ce’ or ‘svm’. If ‘ce’, the model loss function is a cross entropy loss. If ‘svm’, the model loss is topk.SmoothTop1SVM. Defaults to ‘ce’.

  • bag_weight (float) – Weight of the bag loss. The total loss is defined0 as W * loss + (1 - W) * instance_loss, where W is the bag weight. Defaults to 0.7

  • dropout (bool) – Add dropout (p=0.25) after the attention layers. Defaults to False.

  • opt (str) – Optimizer. Either ‘adam’ (Adam optimizer) or ‘sgd’ (Stochastic Gradient Descent). Defaults to ‘adam’.

  • inst_loss (str) – Instance loss function. Either ‘ce’ or ‘svm’. If ‘ce’, the instance loss is a cross entropy loss. If ‘svm’, the loss is topk.SmoothTop1SVM. Defaults to ‘ce’.

  • no_inst_cluster (bool) – Disable instance-level clustering. Defaults to False.

  • B (int) – Number of positive/negative patches to sample for instance-level training. Defaults to 8.

  • validate (bool) – Validate the hyperparameter configuration. Defaults to True.