Shortcuts

slideflow.model

This module provides the ModelParams class to organize model and training parameters/hyperparameters and assist with model building, as well as the Trainer class that executes model training and evaluation. RegressionTrainer and SurvivalTrainer are extensions of this class, supporting regression and Cox Proportional Hazards outcomes, respectively. The function build_trainer() can choose and return the correct model instance based on the provided hyperparameters.

Note

In order to support both Tensorflow and PyTorch backends, the slideflow.model module will import either slideflow.model.tensorflow or slideflow.model.torch according to the currently active backend, indicated by the environmental variable SF_BACKEND.

See Training for a detailed look at how to train models.

Trainer

class Trainer(hp: ModelParams, outdir: str, labels: Dict[str, Any], *, slide_input: Dict[str, Any] | None = None, name: str = 'Trainer', feature_sizes: List[int] | None = None, feature_names: List[str] | None = None, outcome_names: List[str] | None = None, mixed_precision: bool = True, allow_tf32: bool = False, config: Dict[str, Any] | None = None, use_neptune: bool = False, neptune_api: str | None = None, neptune_workspace: str | None = None, load_method: str = 'weights', custom_objects: Dict[str, Any] | None = None, device: str | None = None, transform: Callable | Dict[str, Callable] | None = None, pin_memory: bool = True, num_workers: int = 4, chunk_size: int = 8)[source]

Base trainer class containing functionality for model building, input processing, training, and evaluation.

This base class requires categorical outcome(s). Additional outcome types are supported by slideflow.model.RegressionTrainer and slideflow.model.SurvivalTrainer.

Slide-level (e.g. clinical) features can be used as additional model input by providing slide labels in the slide annotations dictionary, under the key ‘input’.

Sets base configuration, preparing model inputs and outputs.

Parameters:
  • hp (slideflow.ModelParams) – ModelParams object.

  • outdir (str) – Destination for event logs and checkpoints.

  • labels (dict) – Dict mapping slide names to outcome labels (int or float format).

  • slide_input (dict) – Dict mapping slide names to additional slide-level input, concatenated after post-conv.

  • name (str, optional) – Optional name describing the model, used for model saving. Defaults to None.

  • feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as model input.

  • feature_names (list, optional) – List of names for input features. Used when permuting feature importance.

  • outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.

  • mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.

  • allow_tf32 (bool) – Allow internal use of Tensorfloat-32 format. Defaults to False.

  • config (dict, optional) – Training configuration dictionary, used for logging and image format verification. Defaults to None.

  • use_neptune (bool, optional) – Use Neptune API logging. Defaults to False

  • neptune_api (str, optional) – Neptune API token, used for logging. Defaults to None.

  • neptune_workspace (str, optional) – Neptune workspace. Defaults to None.

  • load_method (str) – Loading method to use when reading model. This argument is ignored in the PyTorch backend, as all models are loaded by first building the model with hyperparameters detected in params.json, then loading weights with torch.nn.Module.load_state_dict(). Defaults to ‘full’ (ignored).

  • transform (callable or dict, optional) – Optional transform to apply to input images. If dict, must have the keys ‘train’ and/or ‘val’, mapping to callables that takes a single image Tensor as input and returns a single image Tensor. If None, no transform is applied. If a single callable is provided, it will be applied to both training and validation data. If a dict is provided, the ‘train’ transform will be applied to training data and the ‘val’ transform will be applied to validation data. If a dict is provided and either ‘train’ or ‘val’ is None, no transform will be applied to that data. Defaults to None.

  • pin_memory (bool) – Set the pin_memory attribute for dataloaders. Defaults to True.

  • num_workers (int) – Set the number of workers for dataloaders. Defaults to 4.

  • chunk_size (int) – Set the chunk size for TFRecord reading. Defaults to 8.

load(self, model: str, training=True) None

Loads a state dict at the given model location. Requires that the Trainer’s hyperparameters (Trainer.hp) match the hyperparameters of the model to be loaded.

evaluate(self, dataset: Dataset, batch_size: int | None = None, save_predictions: bool | str = 'parquet', reduce_method: str | Callable = 'average', norm_fit: Dict[str, ndarray] | Dict[str, List] | None = None, uq: bool | str = 'auto', from_wsi: bool = False, roi_method: str = 'auto')

Evaluate model, saving metrics and predictions.

Parameters:
  • dataset (slideflow.dataset.Dataset) – Dataset to evaluate.

  • batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)

  • save_predictions (bool or str, optional) – Save tile, slide, and patient-level predictions at each evaluation. May be ‘csv’, ‘feather’, or ‘parquet’. If False, will not save predictions. Defaults to ‘parquet’.

  • reduce_method (str, optional) – Reduction method for calculating slide-level and patient-level predictions for categorical outcomes. Options include ‘average’, ‘mean’, ‘proportion’, ‘median’, ‘sum’, ‘min’, ‘max’, or a callable function. ‘average’ and ‘mean’ are synonymous, with both options kept for backwards compatibility. If ‘average’ or ‘mean’, will reduce with average of each logit across tiles. If ‘proportion’, will convert tile predictions into onehot encoding then reduce by averaging these onehot values. For all other values, will reduce with the specified function, applied via the pandas DataFrame.agg() function. Defaults to ‘average’.

  • norm_fit (Dict[str, np.ndarray]) – Normalizer fit, mapping fit parameters (e.g. target_means, target_stds) to values (np.ndarray). If not provided, will fit normalizer using model params (if applicable). Defaults to None.

  • uq (bool or str, optional) – Enable UQ estimation (for applicable models). Defaults to ‘auto’.

  • from_wsi (bool) – Generate predictions from tiles dynamically extracted from whole-slide images, rather than TFRecords. Defaults to False (use TFRecords).

  • roi_method (str) – ROI method to use if from_wsi=True (ignored if from_wsi=False). Either ‘inside’, ‘outside’, ‘auto’, ‘ignore’. If ‘inside’ or ‘outside’, will extract tiles in/out of an ROI, and raise errors.MissingROIError if an ROI is not available. If ‘auto’, will extract tiles inside an ROI if available, and across the whole-slide if no ROI is found. If ‘ignore’, will extract tiles across the whole-slide regardless of whether an ROI is available. Defaults to ‘auto’.

Returns:

Dictionary of evaluation metrics.

predict(self, dataset: Dataset, batch_size: int | None = None, norm_fit: Dict[str, ndarray] | Dict[str, List] | None = None, format: str = 'parquet', from_wsi: bool = False, roi_method: str = 'auto', reduce_method: str | Callable = 'average') Dict[str, DataFrame]

Perform inference on a model, saving predictions.

Parameters:
  • dataset (slideflow.dataset.Dataset) – Dataset containing TFRecords to evaluate.

  • batch_size (int, optional) – Evaluation batch size. Defaults to the same as training (per self.hp)

  • norm_fit (Dict[str, np.ndarray]) – Normalizer fit, mapping fit parameters (e.g. target_means, target_stds) to values (np.ndarray). If not provided, will fit normalizer using model params (if applicable). Defaults to None.

  • format (str, optional) – Format in which to save predictions. Either ‘csv’, ‘feather’, or ‘parquet’. Defaults to ‘parquet’.

  • from_wsi (bool) – Generate predictions from tiles dynamically extracted from whole-slide images, rather than TFRecords. Defaults to False (use TFRecords).

  • roi_method (str) – ROI method to use if from_wsi=True (ignored if from_wsi=False). Either ‘inside’, ‘outside’, ‘auto’, ‘ignore’. If ‘inside’ or ‘outside’, will extract tiles in/out of an ROI, and raise errors.MissingROIError if an ROI is not available. If ‘auto’, will extract tiles inside an ROI if available, and across the whole-slide if no ROI is found. If ‘ignore’, will extract tiles across the whole-slide regardless of whether an ROI is available. Defaults to ‘auto’.

  • reduce_method (str, optional) – Reduction method for calculating slide-level and patient-level predictions for categorical outcomes. Options include ‘average’, ‘mean’, ‘proportion’, ‘median’, ‘sum’, ‘min’, ‘max’, or a callable function. ‘average’ and ‘mean’ are synonymous, with both options kept for backwards compatibility. If ‘average’ or ‘mean’, will reduce with average of each logit across tiles. If ‘proportion’, will convert tile predictions into onehot encoding then reduce by averaging these onehot values. For all other values, will reduce with the specified function, applied via the pandas DataFrame.agg() function. Defaults to ‘average’.

Returns:

Dictionary with keys ‘tile’, ‘slide’, and ‘patient’, and values containing DataFrames with tile-, slide-, and patient-level predictions.

Return type:

Dict[str, pd.DataFrame]

train(self, train_dts: Dataset, val_dts: Dataset, log_frequency: int = 20, validate_on_batch: int = 0, validation_batch_size: int | None = None, validation_steps: int = 50, starting_epoch: int = 0, ema_observations: int = 20, ema_smoothing: int = 2, use_tensorboard: bool = True, steps_per_epoch_override: int = 0, save_predictions: bool | str = 'parquet', save_model: bool = True, resume_training: str | None = None, pretrain: str | None = 'imagenet', checkpoint: str | None = None, save_checkpoints: bool = False, multi_gpu: bool = False, norm_fit: Dict[str, ndarray] | Dict[str, List] | None = None, reduce_method: str | Callable = 'average', seed: int = 0, from_wsi: bool = False, roi_method: str = 'auto') Dict[str, Any]

Builds and trains a model from hyperparameters.

Parameters:
  • train_dts (slideflow.dataset.Dataset) – Training dataset.

  • val_dts (slideflow.dataset.Dataset) – Validation dataset.

  • log_frequency (int, optional) – How frequent to update Tensorboard logs, in batches. Defaults to 100.

  • validate_on_batch (int, optional) – Validation will be performed every N batches. Defaults to 0.

  • validation_batch_size (int, optional) – Validation batch size. Defaults to same as training (per self.hp).

  • validation_steps (int, optional) – Number of batches to use for each instance of validation. Defaults to 200.

  • starting_epoch (int, optional) – Starts training at this epoch. Defaults to 0.

  • ema_observations (int, optional) – Number of observations over which to perform exponential moving average smoothing. Defaults to 20.

  • ema_smoothing (int, optional) – Exponential average smoothing value. Defaults to 2.

  • use_tensoboard (bool, optional) – Enable tensorboard callbacks. Defaults to False.

  • steps_per_epoch_override (int, optional) – Manually set the number of steps per epoch. Defaults to None.

  • save_predictions (bool or str, optional) – Save tile, slide, and patient-level predictions at each evaluation. May be ‘csv’, ‘feather’, or ‘parquet’. If False, will not save predictions. Defaults to ‘parquet’.

  • save_model (bool, optional) – Save models when evaluating at specified epochs. Defaults to False.

  • resume_training (str, optional) – Not applicable to PyTorch backend. Included as argument for compatibility with Tensorflow backend. Will raise NotImplementedError if supplied.

  • pretrain (str, optional) – Either ‘imagenet’ or path to Tensorflow model from which to load weights. Defaults to ‘imagenet’.

  • checkpoint (str, optional) – Path to cp.ckpt from which to load weights. Defaults to None.

  • norm_fit (Dict[str, np.ndarray]) – Normalizer fit, mapping fit parameters (e.g. target_means, target_stds) to values (np.ndarray). If not provided, will fit normalizer using model params (if applicable). Defaults to None.

  • reduce_method (str, optional) – Reduction method for calculating slide-level and patient-level predictions for categorical outcomes. Options include ‘average’, ‘mean’, ‘proportion’, ‘median’, ‘sum’, ‘min’, ‘max’, or a callable function. ‘average’ and ‘mean’ are synonymous, with both options kept for backwards compatibility. If ‘average’ or ‘mean’, will reduce with average of each logit across tiles. If ‘proportion’, will convert tile predictions into onehot encoding then reduce by averaging these onehot values. For all other values, will reduce with the specified function, applied via the pandas DataFrame.agg() function. Defaults to ‘average’.

  • seed (int) – Set numpy random seed. Defaults to 0.

  • from_wsi (bool) – Generate predictions from tiles dynamically extracted from whole-slide images, rather than TFRecords. Defaults to False (use TFRecords).

  • roi_method (str) – ROI method to use if from_wsi=True (ignored if from_wsi=False). Either ‘inside’, ‘outside’, ‘auto’, ‘ignore’. If ‘inside’ or ‘outside’, will extract tiles in/out of an ROI, and raise errors.MissingROIError if an ROI is not available. If ‘auto’, will extract tiles inside an ROI if available, and across the whole-slide if no ROI is found. If ‘ignore’, will extract tiles across the whole-slide regardless of whether an ROI is available. Defaults to ‘auto’.

Returns:

Nested dict containing metrics for each evaluated epoch.

Return type:

Dict

RegressionTrainer

class RegressionTrainer(*args, **kwargs)[source]

Extends the base slideflow.model.Trainer class to add support for continuous outcomes. Requires that all outcomes be continuous, with appropriate regression loss function. Uses R-squared as the evaluation metric, rather than AUROC.

In this case, for the PyTorch backend, the continuous outcomes support is already baked into the base Trainer class, so no additional modifications are required. This class is written to inherit the Trainer class without modification to maintain consistency with the Tensorflow backend.

Sets base configuration, preparing model inputs and outputs.

Parameters:
  • hp (slideflow.ModelParams) – ModelParams object.

  • outdir (str) – Destination for event logs and checkpoints.

  • labels (dict) – Dict mapping slide names to outcome labels (int or float format).

  • slide_input (dict) – Dict mapping slide names to additional slide-level input, concatenated after post-conv.

  • name (str, optional) – Optional name describing the model, used for model saving. Defaults to None.

  • feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as model input.

  • feature_names (list, optional) – List of names for input features. Used when permuting feature importance.

  • outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.

  • mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.

  • allow_tf32 (bool) – Allow internal use of Tensorfloat-32 format. Defaults to False.

  • config (dict, optional) – Training configuration dictionary, used for logging and image format verification. Defaults to None.

  • use_neptune (bool, optional) – Use Neptune API logging. Defaults to False

  • neptune_api (str, optional) – Neptune API token, used for logging. Defaults to None.

  • neptune_workspace (str, optional) – Neptune workspace. Defaults to None.

  • load_method (str) – Loading method to use when reading model. This argument is ignored in the PyTorch backend, as all models are loaded by first building the model with hyperparameters detected in params.json, then loading weights with torch.nn.Module.load_state_dict(). Defaults to ‘full’ (ignored).

  • transform (callable or dict, optional) – Optional transform to apply to input images. If dict, must have the keys ‘train’ and/or ‘val’, mapping to callables that takes a single image Tensor as input and returns a single image Tensor. If None, no transform is applied. If a single callable is provided, it will be applied to both training and validation data. If a dict is provided, the ‘train’ transform will be applied to training data and the ‘val’ transform will be applied to validation data. If a dict is provided and either ‘train’ or ‘val’ is None, no transform will be applied to that data. Defaults to None.

  • pin_memory (bool) – Set the pin_memory attribute for dataloaders. Defaults to True.

  • num_workers (int) – Set the number of workers for dataloaders. Defaults to 4.

  • chunk_size (int) – Set the chunk size for TFRecord reading. Defaults to 8.

SurvivalTrainer

class SurvivalTrainer(*args, **kwargs)[source]

Cox proportional hazards (CPH) models are not yet implemented, but are planned for a future update.

Sets base configuration, preparing model inputs and outputs.

Parameters:
  • hp (slideflow.ModelParams) – ModelParams object.

  • outdir (str) – Destination for event logs and checkpoints.

  • labels (dict) – Dict mapping slide names to outcome labels (int or float format).

  • slide_input (dict) – Dict mapping slide names to additional slide-level input, concatenated after post-conv.

  • name (str, optional) – Optional name describing the model, used for model saving. Defaults to None.

  • feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as model input.

  • feature_names (list, optional) – List of names for input features. Used when permuting feature importance.

  • outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.

  • mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.

  • allow_tf32 (bool) – Allow internal use of Tensorfloat-32 format. Defaults to False.

  • config (dict, optional) – Training configuration dictionary, used for logging and image format verification. Defaults to None.

  • use_neptune (bool, optional) – Use Neptune API logging. Defaults to False

  • neptune_api (str, optional) – Neptune API token, used for logging. Defaults to None.

  • neptune_workspace (str, optional) – Neptune workspace. Defaults to None.

  • load_method (str) – Loading method to use when reading model. This argument is ignored in the PyTorch backend, as all models are loaded by first building the model with hyperparameters detected in params.json, then loading weights with torch.nn.Module.load_state_dict(). Defaults to ‘full’ (ignored).

  • transform (callable or dict, optional) – Optional transform to apply to input images. If dict, must have the keys ‘train’ and/or ‘val’, mapping to callables that takes a single image Tensor as input and returns a single image Tensor. If None, no transform is applied. If a single callable is provided, it will be applied to both training and validation data. If a dict is provided, the ‘train’ transform will be applied to training data and the ‘val’ transform will be applied to validation data. If a dict is provided and either ‘train’ or ‘val’ is None, no transform will be applied to that data. Defaults to None.

  • pin_memory (bool) – Set the pin_memory attribute for dataloaders. Defaults to True.

  • num_workers (int) – Set the number of workers for dataloaders. Defaults to 4.

  • chunk_size (int) – Set the chunk size for TFRecord reading. Defaults to 8.

Features

class Features(path: str | None, layers: str | List[str] | None = 'postconv', *, include_preds: bool = False, mixed_precision: bool = True, channels_last: bool = True, device: device | None = None, apply_softmax: bool | None = None, pooling: Any | None = None, load_method: str = 'weights')[source]

Interface for obtaining predictions and features from intermediate layer activations from Slideflow models.

Use by calling on either a batch of images (returning outputs for a single batch), or by calling on a slideflow.WSI object, which will generate an array of spatially-mapped activations matching the slide.

Examples

Calling on batch of images:

interface = Features('/model/path', layers='postconv')
for image_batch in train_data:
    # Return shape: (batch_size, num_features)
    batch_features = interface(image_batch)

Calling on a slide:

slide = sf.slide.WSI(...)
interface = Features('/model/path', layers='postconv')
# Return shape:
# (slide.grid.shape[0], slide.grid.shape[1], num_features)
activations_grid = interface(slide)

Note

When this interface is called on a batch of images, no image processing or stain normalization will be performed, as it is assumed that normalization will occur during data loader image processing. When the interface is called on a slideflow.WSI, the normalization strategy will be read from the model configuration file, and normalization will be performed on image tiles extracted from the WSI. If this interface was created from an existing model and there is no model configuration file to read, a slideflow.norm.StainNormalizer object may be passed during initialization via the argument wsi_normalizer.

Creates an activations interface from a saved slideflow model which outputs feature activations at the designated layers.

Intermediate layers are returned in the order of layers. predictions are returned last.

Parameters:
  • path (str) – Path to saved Slideflow model.

  • layers (list(str), optional) – Layers from which to generate activations. The post-convolution activation layer is accessed via ‘postconv’. Defaults to ‘postconv’.

  • include_preds (bool, optional) – Include predictions in output. Will be returned last. Defaults to False.

  • mixed_precision (bool, optional) – Use mixed precision. Defaults to True.

  • device (torch.device, optional) – Device for model. Defaults to torch.device(‘cuda’)

  • apply_softmax (bool) – Apply softmax transformation to model output. Defaults to True for classification models, False for regression models.

  • pooling (Callable or str, optional) – PyTorch pooling function to use on feature layers. May be a string (‘avg’ or ‘max’) or a callable PyTorch function.

  • load_method (str) – Loading method to use when reading model. This argument is ignored in the PyTorch backend, as all models are loaded by first building the model with hyperparameters detected in params.json, then loading weights with torch.nn.Module.load_state_dict(). Defaults to ‘full’ (ignored).

from_model(model: Module, tile_px: int, layers: str | List[str] | None = 'postconv', *, include_preds: bool = False, mixed_precision: bool = True, channels_last: bool = True, wsi_normalizer: StainNormalizer | None = None, apply_softmax: bool = True, pooling: Any | None = None)

Creates an activations interface from a loaded slideflow model which outputs feature activations at the designated layers.

Intermediate layers are returned in the order of layers. predictions are returned last.

Parameters:
  • model (tensorflow.keras.models.Model) – Loaded model.

  • tile_px (int) – Width/height of input image size.

  • layers (list(str), optional) – Layers from which to generate activations. The post-convolution activation layer is accessed via ‘postconv’. Defaults to ‘postconv’.

  • include_preds (bool, optional) – Include predictions in output. Will be returned last. Defaults to False.

  • mixed_precision (bool, optional) – Use mixed precision. Defaults to True.

  • wsi_normalizer (slideflow.norm.StainNormalizer) – Stain normalizer to use on whole-slide images. Is not used on individual tile datasets via __call__. Defaults to None.

  • apply_softmax (bool) – Apply softmax transformation to model output. Defaults to True.

  • pooling (Callable or str, optional) – PyTorch pooling function to use on feature layers. May be a string (‘avg’ or ‘max’) or a callable PyTorch function.

__call__(self, inp: Tensor | WSI, **kwargs) List[Tensor] | ndarray | None

Process a given input and return activations and/or predictions. Expects either a batch of images or a slideflow.slide.WSI object.

When calling on a WSI object, keyword arguments are passed to slideflow.WSI.build_generator().

Other functions

build_trainer(hp: ModelParams, outdir: str, labels: Dict[str, Any], **kwargs) Trainer[source]

From the given slideflow.ModelParams object, returns the appropriate instance of slideflow.model.Trainer.

Parameters:
  • hp (slideflow.ModelParams) – ModelParams object.

  • outdir (str) – Path for event logs and checkpoints.

  • labels (dict) – Dict mapping slide names to outcome labels (int or float format).

Keyword Arguments:
  • slide_input (dict) – Dict mapping slide names to additional slide-level input, concatenated after post-conv.

  • name (str, optional) – Optional name describing the model, used for model saving. Defaults to ‘Trainer’.

  • feature_sizes (list, optional) – List of sizes of input features. Required if providing additional input features as input to the model.

  • feature_names (list, optional) – List of names for input features. Used when permuting feature importance.

  • outcome_names (list, optional) – Name of each outcome. Defaults to “Outcome {X}” for each outcome.

  • mixed_precision (bool, optional) – Use FP16 mixed precision (rather than FP32). Defaults to True.

  • allow_tf32 (bool) – Allow internal use of Tensorfloat-32 format. Defaults to False.

  • config (dict, optional) – Training configuration dictionary, used for logging. Defaults to None.

  • use_neptune (bool, optional) – Use Neptune API logging. Defaults to False

  • neptune_api (str, optional) – Neptune API token, used for logging. Defaults to None.

  • neptune_workspace (str, optional) – Neptune workspace. Defaults to None.

  • load_method (str) – Either ‘full’ or ‘weights’. Method to use when loading a Tensorflow model. If ‘full’, loads the model with tf.keras.models.load_model(). If ‘weights’, will read the params.json configuration file, build the model architecture, and then load weights from the given model with Model.load_weights(). Loading with ‘full’ may improve compatibility across Slideflow versions. Loading with ‘weights’ may improve compatibility across hardware & environments.

  • custom_objects (dict, Optional) – Dictionary mapping names (strings) to custom classes or functions. Defaults to None.

  • num_workers (int) – Number of dataloader workers. Only used for PyTorch. Defaults to 4.

build_feature_extractor(name: str, backend: str | None = None, **kwargs) BaseFeatureExtractor[source]

Build a feature extractor.

The returned feature extractor is a callable object, which returns features (often layer activations) for either a batch of images or a slideflow.WSI object.

If generating features for a batch of images, images are expected to be in (B, W, H, C) format and non-standardized (scaled 0-255) with dtype uint8. The feature extractors perform all needed preprocessing on the fly.

If generating features for a slide, the slide is expected to be a slideflow.WSI object. The feature extractor will generate features for each tile in the slide, returning a numpy array of shape (W, H, F), where F is the number of features.

Parameters:

name (str) – Name of the feature extractor to build. Available feature extractors are listed with slideflow.model.list_extractors().

Keyword Arguments:
  • tile_px (int) – Tile size (input image size), in pixels.

  • **kwargs (Any) – All remaining keyword arguments are passed to the feature extractor factory function, and may be different for each extractor.

Returns:

A callable object which accepts a batch of images (B, W, H, C) of dtype uint8 and returns a batch of features (dtype float32).

Examples

Create an extractor that calculates post-convolutional layer activations from an imagenet-pretrained Resnet50 model.

import slideflow as sf

extractor = sf.build_feature_extractor(
    'resnet50_imagenet'
)

Create an extractor that calculates ‘conv4_block4_2_relu’ activations from an imagenet-pretrained Resnet50 model.

extractor = sf.build_feature_extractor(
    'resnet50_imagenet',
    layers='conv4_block4_2_relu
)

Create a pretrained “CTransPath” extractor.

extractor = sf.build_feature_extractor('ctranspath')

Use an extractor to calculate layer activations for an entire dataset.

import slideflow as sf

# Load a project and dataset
P = sf.load_project(...)
dataset = P.dataset(...)

# Create a feature extractor
resnet = sf.build_feature_extractor(
    'resnet50_imagenet'
)

# Calculate features for the entire dataset
features = sf.DatasetFeatures(
    resnet,
    dataset=dataset
)

Generate a map of features across a slide.

import slideflow as sf

# Load a slide
wsi = sf.WSI(...)

# Create a feature extractor
retccl = sf.build_feature_extractor(
    'retccl',
    resize=True
)

# Create a feature map, a 2D array of shape
# (W, H, F), where F is the number of features.
features = retccl(wsi)
list_extractors()[source]

Return a list of all available feature extractors.

load(path: str) Module[source]

Load a model trained with Slideflow.

Parameters:

path (str) – Path to saved model. Must be a model trained in Slideflow.

Returns:

Loaded model.

Return type:

torch.nn.Module

is_tensorflow_model(arg: Any) bool[source]

Checks if the object is a Tensorflow Model or path to Tensorflow model.

is_tensorflow_tensor(arg: Any) bool[source]

Checks if the given object is a Tensorflow Tensor.

is_torch_model(arg: Any) bool[source]

Checks if the object is a PyTorch Module or path to PyTorch model.

is_torch_tensor(arg: Any) bool[source]

Checks if the given object is a Tensorflow Tensor.

read_hp_sweep(filename: str, models: List[str] | None = None) Dict[str, ModelParams][source]

Organizes a list of hyperparameters ojects and associated models names.

Parameters:
  • filename (str) – Path to hyperparameter sweep JSON file.

  • models (list(str)) – List of model names. Defaults to None. If not supplied, returns all valid models from batch file.

Returns:

List of (Hyperparameter, model_name) for each HP combination

rebuild_extractor(bags_or_model: str, allow_errors: bool = False, native_normalizer: bool = True) Tuple[BaseFeatureExtractor | None, StainNormalizer | None][source]

Recreate the extractor used to generate features stored in bags.

Parameters:
  • bags_or_model (str) – Either a path to directory containing feature bags, or a path to a trained MIL model. If a path to a trained MIL model, the extractor used to generate features will be recreated.

  • allow_errors (bool) – If True, return None if the extractor cannot be rebuilt. If False, raise an error. Defaults to False.

  • native_normalizer (bool, optional) – Whether to use PyTorch/Tensorflow-native stain normalization, if applicable. If False, will use the OpenCV/Numpy implementations. Defaults to True.

Returns:

Extractor function, or None if allow_errors is

True and the extractor cannot be rebuilt.

Optional[StainNormalizer]: Stain normalizer used when generating

feature bags, or None if no stain normalization was used.

Return type:

Optional[BaseFeatureExtractor]