Shortcuts

slideflow.Project

class Project(root: str, use_neptune: bool = False, create: bool = False, **kwargs)[source]

Assists with project organization and execution of common tasks.

Load or create a project at a given directory.

If a project does not exist at the given root directory, one can be created if a project configuration was provided via keyword arguments.

Create a project:

import slideflow as sf
P = sf.Project('/project/path', name=..., ...)

Load an existing project:

P = sf.Project('/project/path')
Parameters:

root (str) – Path to project directory.

Keyword Arguments:
  • name (str) – Project name. Defaults to ‘MyProject’.

  • annotations (str) – Path to annotations CSV file. Defaults to ‘./annotations.csv’

  • dataset_config (str) – Path to dataset configuration JSON file. Defaults to ‘./datasets.json’.

  • sources (list(str)) – List of dataset sources to include in project. Defaults to ‘source1’.

  • models_dir (str) – Path to directory in which to save models. Defaults to ‘./models’.

  • eval_dir (str) – Path to directory in which to save evaluations. Defaults to ‘./eval’.

Raises:

slideflow.errors.ProjectError – if project folder does not exist, or the folder exists but kwargs are provided.

Attributes

Project.annotations

Path to annotations file.

Project.dataset_config

Path to dataset configuration JSON file.

Project.eval_dir

Path to evaluation directory.

Project.models_dir

Path to models directory.

Project.name

Descriptive project name.

Project.neptune_api

Neptune API token.

Project.neptune_workspace

Neptune workspace name.

Project.sources

List of dataset sources active in this project.

Methods

add_source(self, name: str, *, slides: str | None = None, roi: str | None = None, tiles: str | None = None, tfrecords: str | None = None, path: str | None = None) None

Add a dataset source to the dataset configuration file.

Parameters:

name (str) – Dataset source name.

Keyword Arguments:
  • slides (str, optional) – Path to directory containing slides. Defaults to None.

  • roi (str, optional) – Path to directory containing CSV ROIs. Defaults to None.

  • tiles (str, optional) – Path to directory for loose extracted tiles images (*.jpg, *.png). Defaults to None.

  • tfrecords (str, optional) – Path to directory for storing TFRecords of tiles. Defaults to None.

  • path (str, optional) – Path to dataset configuration file. If not provided, uses project default. Defaults to None.

associate_slide_names(self) None

Automatically associate patients with slides in the annotations.

cell_segmentation(self, diam_um: float, dest: str | None = None, *, filters: Dict | None = None, filter_blank: str | List[str] | None = None, sources: str | List[str], **kwargs) None

Perform cell segmentation on slides, saving segmentation masks.

Cells are segmented with Cellpose from whole-slide images, and segmentation masks are saved in the masks/ subfolder within the project root directory.

Note

Cell segmentation requires installation of the cellpose package available via pip:

pip install cellpose
Parameters:
  • diam_um (float, optional) – Cell segmentation diameter, in microns.

  • dest (str) – Destination in which to save cell segmentation masks. If None, will save masks in {project_root}/masks Defaults to None.

Keyword Arguments:
  • batch_size (int) – Batch size for cell segmentation. Defaults to 8.

  • cp_thresh (float) – Cell probability threshold. All pixels with value above threshold kept for masks, decrease to find more and larger masks. Defaults to 0.

  • diam_mean (int, optional) – Cell diameter to detect, in pixels (without image resizing). If None, uses Cellpose defaults (17 for the ‘nuclei’ model, 30 for all others).

  • downscale (float) – Factor by which to downscale generated masks after calculation. Defaults to None (keep masks at original size).

  • flow_threshold (float) – Flow error threshold (all cells with errors below threshold are kept). Defaults to 0.4.

  • gpus (int, list(int)) – GPUs to use for cell segmentation. Defaults to 0 (first GPU).

  • interp (bool) – Interpolate during 2D dynamics. Defaults to True.

  • qc (str) – Slide-level quality control method to use before performing cell segmentation. Defaults to “Otsu”.

  • model (str, cellpose.models.Cellpose) – Cellpose model to use for cell segmentation. May be any valid cellpose model. Defaults to ‘cyto2’.

  • mpp (float) – Microns-per-pixel at which cells should be segmented. Defaults to 0.5.

  • num_workers (int, optional) – Number of workers. Defaults to 2 * num_gpus.

  • save_centroid (bool) – Save mask centroids. Increases memory utilization slightly. Defaults to True.

  • save_flow (bool) – Save flow values for the whole-slide image. Increases memory utilization. Defaults to False.

  • sources (List[str]) – List of dataset sources to include from configuration file.

  • tile (bool) – Tiles image to decrease GPU/CPU memory usage. Defaults to True.

  • verbose (bool) – Verbose log output at the INFO level. Defaults to True.

  • window_size (int) – Window size at which to segment cells across a whole-slide image. Defaults to 256.

Returns:

None

create_blank_annotations(self, filename: str | None = None) None

Create an empty annotations file.

Parameters:

filename (str) – Annotations file destination. If not provided, will use project default.

create_hp_sweep(self, filename: str = 'sweep.json', label: str | None = None, **kwargs: Any) None

Prepare a grid-search hyperparameter sweep, saving to a config file.

To initiate a grid-search sweep using the created JSON file, pass this file to the params argument of Project.train():

>>> P.train('outcome', params='sweep.json', ...)
Parameters:
  • filename (str, optional) – Filename for hyperparameter sweep. Overwrites existing files. Saves in project root directory. Defaults to “sweep.json”.

  • label (str, optional) – Label to use when naming models in sweep. Defaults to None.

  • **kwargs – Parameters to include in the sweep. Parameters may either be fixed or provided as lists.

evaluate(self, model: str | None, outcomes: str | List[str], *, dataset: Dataset, filters: Dict | None = None, filter_blank: str | List[str] | None = None, min_tiles: int = 0, checkpoint: str | None = None, eval_k_fold: int | None = None, splits: str = 'splits.json', max_tiles: int = 0, mixed_precision: bool = True, allow_tf32: bool = False, input_header: str | List[str] | None = None, load_method: str = 'weights', custom_objects: Dict[str, Any] | None = None, **kwargs: Any) Dict

Evaluate a saved model on a given set of tfrecords.

Parameters:
  • model (str) – Path to model to evaluate.

  • outcomes (str) – Str or list of str. Annotation column header specifying the outcome label(s).

Keyword Arguments:
  • dataset (slideflow.Dataset, optional) – Dataset to evaluate. If not supplied, will evaluate all project tfrecords at the tile_px/tile_um matching the supplied model, optionally using provided filters and filter_blank.

  • filters (dict, optional) – Dataset filters to use for selecting slides. See slideflow.Dataset.filter() for more information. Defaults to None.

  • filter_blank (list(str) or str, optional) – Skip slides that have blank values in these patient annotation columns. Defaults to None.

  • min_tiles (int, optional) – Minimum number of tiles a slide must have to be included in evaluation. Defaults to 0.

  • checkpoint (str, optional) – Path to cp.ckpt file, if evaluating a saved checkpoint. Defaults to None.

  • eval_k_fold (int, optional) – K-fold iteration number to evaluate. Defaults to None. If None, will evaluate all tfrecords irrespective of K-fold.

  • splits (str, optional) – Filename of JSON file in which to log train/val splits. Looks for filename in project root directory. Defaults to “splits.json”.

  • max_tiles (int, optional) – Maximum number of tiles from each slide to evaluate. Defaults to 0. If zero, will include all tiles.

  • mixed_precision (bool, optional) – Enable mixed precision. Defaults to True.

  • allow_tf32 (bool) – Allow internal use of Tensorfloat-32 format. Defaults to False.

  • input_header (str, optional) – Annotation column header to use as additional input. Defaults to None.

  • load_method (str) – Either ‘full’ or ‘weights’. Method to use when loading a Tensorflow model. If ‘full’, loads the model with tf.keras.models.load_model(). If ‘weights’, will read the params.json configuration file, build the model architecture, and then load weights from the given model with Model.load_weights(). Loading with ‘full’ may improve compatibility across Slideflow versions. Loading with ‘weights’ may improve compatibility across hardware & environments.

  • reduce_method (str, optional) – Reduction method for calculating slide-level and patient-level predictions for categorical outcomes. Options include ‘average’, ‘mean’, ‘proportion’, ‘median’, ‘sum’, ‘min’, ‘max’, or a callable function. ‘average’ and ‘mean’ are synonymous, with both options kept for backwards compatibility. If ‘average’ or ‘mean’, will reduce with average of each logit across tiles. If ‘proportion’, will convert tile predictions into onehot encoding then reduce by averaging these onehot values. For all other values, will reduce with the specified function, applied via the pandas DataFrame.agg() function. Defaults to ‘average’.

  • save_predictions (bool or str, optional) – Save tile, slide, and patient-level predictions at each evaluation. May be ‘csv’, ‘feather’, or ‘parquet’. If False, will not save predictions. Defaults to ‘parquet’.

  • custom_objects (dict, Optional) – Dictionary mapping names (strings) to custom classes or functions. Defaults to None.

  • **kwargs – Additional keyword arguments to the Trainer.evaluate() function.

Returns:

Dictionary of keras training results, nested by epoch.

Return type:

Dict

evaluate_mil(self, model: str, outcomes: str | List[str], dataset: Dataset, bags: str | List[str], config: mil._TrainerConfig | None = None, **kwargs)

Evaluate a multi-instance learning model.

Saves results for the evaluation in the mil_eval project folder, including predictions (parquet format), attention (Numpy format for each slide), and attention heatmaps (if attention_heatmaps=True).

Logs classifier metrics (AUROC and AP) to the console.

Parameters:
Keyword Arguments:
  • exp_label (str) – Experiment label, used for naming the subdirectory in the {project root}/mil folder, where training history and the model will be saved.

  • attention_heatmaps (bool) – Calculate and save attention heatmaps. Defaults to False.

  • interpolation (str, optional) – Interpolation strategy for smoothing attention heatmaps. Defaults to ‘bicubic’.

  • cmap (str, optional) – Matplotlib colormap for heatmap. Can be any valid matplotlib colormap. Defaults to ‘inferno’.

  • norm (str, optional) – Normalization strategy for assigning heatmap values to colors. Either ‘two_slope’, or any other valid value for the norm argument of matplotlib.pyplot.imshow. If ‘two_slope’, normalizes values less than 0 and greater than 0 separately. Defaults to None.

evaluate_clam(self, exp_name: str, pt_files: str, outcomes: str | List[str], tile_px: int, tile_um: int | str, *, k: int = 0, eval_tag: str | None = None, filters: Dict | None = None, filter_blank: str | List[str] | None = None, attention_heatmaps: bool = True) None

Deprecated function.

Evaluate CLAM model on activations and export attention heatmaps.

Parameters:
  • exp_name (str) – Name of experiment to evaluate (subfolder in clam/)

  • pt_files (str) – Path to pt_files containing tile-level features.

  • outcomes (str or list) – Annotation column that specifies labels.

  • tile_px (int) – Tile width in pixels.

  • tile_um (int or str) – Tile width in microns (int) or magnification (str, e.g. “20x”).

Keyword Arguments:
  • k (int, optional) – K-fold / split iteration to evaluate. Evaluates the model saved as s_{k}_checkpoint.pt. Defaults to 0.

  • eval_tag (str, optional) – Unique identifier for this evaluation. Defaults to None

  • filters (dict, optional) – Dataset filters to use for selecting slides. See slideflow.Dataset.filter() for more information. Defaults to None.

  • filter_blank (list(str) or str, optional) – Skip slides that have blank values in these patient annotation columns. Defaults to None.

  • attention_heatmaps (bool, optional) – Save attention heatmaps of validation dataset. Defaults to True.

Returns:

None

extract_cells(self, tile_px: int, tile_um: int | str, masks_path: str | None = None, *, filters: Dict | None = None, filter_blank: str | List[str] | None = None, **kwargs: Any) Dict[str, SlideReport]

Extract images of cells from whole-slide images.

Image tiles are extracted from cells, with a tile at each cell centroid. Requires that cells have already been segmented with Project.cell_segmentation(). This function otherwise is similar to slideflow.Project.extract_tiles(), with tiles saved in TFRecords by default.

Parameters:
  • tile_px (int) – Size of tiles to extract at cell centroids (pixels).

  • tile_um (int or str) – Size of tiles to extract, in microns (int) or magnification (str, e.g. “20x”).

  • masks_path (str, optional) – Location of saved masks. If None, will look in project default (subfolder ‘/masks’). Defaults to None.

Keyword Arguments:
  • apply_masks (bool) – Apply cell segmentation masks to the extracted tiles. Defaults to True.

  • **kwargs (Any) – All other keyword arguments are passed to Project.extract_tiles().

Returns:

Dictionary mapping slide paths to each slide’s SlideReport (slideflow.slide.report.SlideReport)

extract_tiles(self, tile_px: int, tile_um: int | str, *, filters: Dict | None = None, filter_blank: str | List[str] | None = None, **kwargs: Any) Dict[str, SlideReport]

Extract tiles from slides.

Preferred use is calling slideflow.Dataset.extract_tiles().

Parameters:
  • tile_px (int) – Size of tiles to extract, in pixels.

  • tile_um (int or str) – Size of tiles to extract, in microns (int) or magnification (str, e.g. “20x”).

Keyword Arguments:
  • filters (dict, optional) – Dataset filters to use for selecting slides. See slideflow.Dataset.filter() for more information. Defaults to None.

  • filter_blank (list(str) or str, optional) – Skip slides that have blank values in these patient annotation columns. Defaults to None.

  • save_tiles (bool, optional) – Save tile images in loose format. Defaults to False.

  • save_tfrecords (bool) – Save compressed image data from extracted tiles into TFRecords in the corresponding TFRecord directory. Defaults to True.

  • source (str, optional) – Name of dataset source from which to select slides for extraction. Defaults to None. If not provided, will default to all sources in project.

  • stride_div (int) – Stride divisor for tile extraction. A stride of 1 will extract non-overlapping tiles. A stride_div of 2 will extract overlapping tiles, with a stride equal to 50% of the tile width. Defaults to 1.

  • enable_downsample (bool) – Enable downsampling for slides. This may result in corrupted image tiles if downsampled slide layers are corrupted or incomplete. Defaults to True.

  • roi_method (str) – Either ‘inside’, ‘outside’, ‘auto’, or ‘ignore’. Determines how ROIs are used to extract tiles. If ‘inside’ or ‘outside’, will extract tiles in/out of an ROI, and skip the slide if an ROI is not available. If ‘auto’, will extract tiles inside an ROI if available, and across the whole-slide if no ROI is found. If ‘ignore’, will extract tiles across the whole-slide regardless of whether an ROI is available. Defaults to ‘auto’.

  • roi_filter_method (str or float) – Method of filtering tiles with ROIs. Either ‘center’ or float (0-1). If ‘center’, tiles are filtered with ROIs based on the center of the tile. If float, tiles are filtered based on the proportion of the tile inside the ROI, and roi_filter_method is interpreted as a threshold. If the proportion of a tile inside the ROI is greater than this number, the tile is included. For example, if roi_filter_method=0.7, a tile that is 80% inside of an ROI will be included, and a tile that is 50% inside of an ROI will be excluded. Defaults to ‘center’.

  • skip_extracted (bool) – Skip slides that have already been extracted. Defaults to True.

  • tma (bool) – Reads slides as Tumor Micro-Arrays (TMAs). Deprecated argument; all slides are now read as standard WSIs.

  • randomize_origin (bool) – Randomize pixel starting position during extraction. Defaults to False.

  • buffer (str, optional) – Slides will be copied to this directory before extraction. Defaults to None. Using an SSD or ramdisk buffer vastly improves tile extraction speed.

  • q_size (int) – Size of queue when using a buffer. Defaults to 2.

  • qc (str, optional) – ‘otsu’, ‘blur’, ‘both’, or None. Perform blur detection quality control - discarding tiles with detected out-of-focus regions or artifact - and/or otsu’s method. Increases tile extraction time. Defaults to None.

  • report (bool) – Save a PDF report of tile extraction. Defaults to True.

  • normalizer (str, optional) – Normalization strategy. Defaults to None.

  • normalizer_source (str, optional) – Stain normalization preset or path to a source image. Valid presets include ‘v1’, ‘v2’, and ‘v3’. If None, will use the default present (‘v3’). Defaults to None.

  • whitespace_fraction (float, optional) – Range 0-1. Discard tiles with this fraction of whitespace. If 1, will not perform whitespace filtering. Defaults to 1.

  • whitespace_threshold (int, optional) – Range 0-255. Defaults to 230. Threshold above which a pixel (RGB average) is whitespace.

  • grayspace_fraction (float, optional) – Range 0-1. Defaults to 0.6. Discard tiles with this fraction of grayspace. If 1, will not perform grayspace filtering.

  • grayspace_threshold (float, optional) – Range 0-1. Defaults to 0.05. Pixels in HSV format with saturation below this threshold are considered grayspace.

  • img_format (str, optional) – ‘png’ or ‘jpg’. Defaults to ‘jpg’. Image format to use in tfrecords. PNG (lossless) for fidelity, JPG (lossy) for efficiency.

  • shuffle (bool, optional) – Shuffle tiles prior to storage in tfrecords. Defaults to True.

  • num_threads (int, optional) – Number of worker processes for each tile extractor. When using cuCIM slide reading backend, defaults to the total number of available CPU cores, using the ‘fork’ multiprocessing method. With Libvips, this defaults to the total number of available CPU cores or 32, whichever is lower, using ‘spawn’ multiprocessing.

  • qc_blur_radius (int, optional) – Quality control blur radius for out-of-focus area detection. Used if qc=True. Defaults to 3.

  • qc_blur_threshold (float, optional) – Quality control blur threshold for detecting out-of-focus areas. Only used if qc=True. Defaults to 0.1

  • qc_filter_threshold (float, optional) – Float between 0-1. Tiles with more than this proportion of blur will be discarded. Only used if qc=True. Defaults to 0.6.

  • qc_mpp (float, optional) – Microns-per-pixel indicating image magnification level at which quality control is performed. Defaults to mpp=4 (effective magnification 2.5 X)

  • dry_run (bool, optional) – Determine tiles that would be extracted, but do not export any images. Defaults to None.

  • max_tiles (int, optional) – Only extract this many tiles per slide. Defaults to None.

Returns:

Dictionary mapping slide paths to each slide’s SlideReport (slideflow.slide.report.SlideReport)

gan_train(self, dataset: Dataset, *, model: str = 'stylegan3', outcomes: str | List[str] | None = None, exp_label: str | None = None, mirror: bool = True, metrics: str | List[str] | None = None, dry_run: bool = False, normalizer: str | None = None, normalizer_source: str | None = None, tile_labels: str | None = None, crop: int | None = None, resize: int | None = None, **kwargs) None

Train a GAN network.

Examples

Train StyleGAN2 from a Slideflow dataset.

>>> P = sf.Project('/project/path')
>>> dataset = P.dataset(tile_px=512, tile_um=400)
>>> P.gan_train(dataset=dataset, exp_label="MyExperiment", ...)

Train StyleGAN2 as a class-conditional network.

>>> P.gan_train(..., outcomes='class_label')

Train using a pretrained network.

>>> P.gan_train(..., resume='/path/to/network.pkl')

Train with multiple GPUs.

>>> P.gan_train(..., gpus=4)
Parameters:

dataset (slideflow.Dataset) – Training dataset.

Keyword Arguments:
  • allow_tf32 (bool) – Allow internal use of Tensorflow-32. Option only available for StyleGAN2. Defaults to True.

  • aug (str) – Augmentation mode. Options include ‘ada’, ‘noaug’, ‘fixed’. Defaults to ‘ada’.

  • augpipe (str) – Augmentation pipeline. Options include ‘blit’, ‘geom’, ‘color’, ‘filter’, ‘noise’, ‘cutout’, ‘bg’, ‘bgc’, ‘bgcfnc’. Only available for StyleGAN2. Defaults to ‘bgcfnc’.

  • batch (int, optional) – Override batch size set by cfg.

  • cfg (str) – StyleGAN2 base configuration. Options include ‘auto’, ‘stylegan2’, ‘paper256’, ‘paper512’, ‘paper1024’, and ‘cifar’. Defaults to ‘auto’.

  • dry_run (bool) – Set up training but do not execute. Defaults to False.

  • exp_label (str, optional) – Experiment label. Defaults to None.

  • freezed (int) – Freeze this many discriminator layers. Defaults to 0.

  • fp32 (bool, optional) – Disable mixed-precision training. Defaults to False.

  • gamma (float, optional) – Override R1 gamma from configuration (set with cfg).

  • gpus (int) – Number GPUs to train on in parallel. Defaults to 1.

  • kimg (int) – Override training duration in kimg (thousand images) set by cfg. Most configurations default to 25,000 kimg (25 million images).

  • lazy_resume (bool) – networks, for example to load a non-conditional network when training a conditional network. Defaults to False.

  • mirror (bool) – Randomly flip/rotate images during training. Defaults to True.

  • metrics (str, list(str), optional) – Metrics to calculate during training. Options include ‘fid50k’, ‘is50k’, ‘ppl_zfull’, ‘ppl_wfull’, ‘ppl_zend’, ‘ppl2_wend’, ‘ls’, and ‘pr50k3’. Defaults to None.

  • model (str) – Architecture to train. Valid model architectures include “stylegan2” and “stylegan3”. Defaults to “stylegan3”.

  • nhwc (bool) – Use NWHC memory format with FP16. Defaults to False.

  • nobench (bool) – Disable cuDNN benchmarking. Defaults to False.

  • outcomes (str, list(str), optional) – Class conditioning outcome labels for training a class-conditioned GAN. If not provided, trains an unconditioned GAN. Defaults to None.

  • tile_labels (str, optional) – Path to pandas dataframe with tile-level labels. The dataframe should be indexed by tile name, where the name of the tile follows the format: [slide name]-[tile x coordinate]-[tile y coordinate], e.g.: slide1-251-666. The dataframe should have a single column with the name ‘label’. Labels can be categorical or continuous. If categorical, the labels should be onehot encoded.

  • crop (int, optional) – Randomly crop images to this target size during training. This permits training a smaller network (e.g. 256 x 256) on larger images (e.g. 299 x 299). Defaults to None.

  • resize (int, optional) – Resize images to this target size during training. This permits training a smaller network (e.g. 256 x 256) on larger images (e.g. 299 x 299). If both crop and resize are provided, cropping will be performed first. Defaults to None.

  • resume (str) – Load previous network. Options include ‘noresume’ , ‘ffhq256’, ‘ffhq512’, ‘ffhqq1024’, ‘celebahq256’, ‘lsundog256’, <file>, or <url>. Defaults to ‘noresume’.

  • snap (int) – Snapshot interval for saving network and example images. Defaults to 50 ticks.

gan_generate(self, network_pkl: str, out: str, seeds: List[int], **kwargs) None

Generate images from a trained GAN network.

Examples

Save images as .png for seeds 0-100.

>>> network_pkl = '/path/to/trained/gan.pkl'
>>> P.gan_generate(
...     network_pkl,
...     out='/dir',
...     format='jpg',
...     seeds=range(100))

Save images in TFRecord format.

>>> P.gan_generate(... out='target.tfrecords')

Save images of class ‘0’ for a class-conditional GAN.

>>> P.gan_generate(..., class_idx=0)

Resize GAN images (trained at 512 px / 400 um) to match a target tile size (299 px / 302 um).

>>> P.gan_generate(
...     ...,
...     gan_px=512,
...     gan_um=400,
...     target_px=299,
...     target_um=302)
Parameters:
  • network_pkl (str) – Path to a trained StyleGAN2 network (.pkl)

  • out (str) – Directory in which to save generated images.

  • seeds (list(int)) – Seeds for which images will be generated.

Keyword Arguments:
  • format (str, optional) – Image format, either ‘jpg’ or ‘png’. Defaults to ‘png’.

  • truncation_psi (float, optional) – Truncation PSI. Defaults to 1.

  • noise_mode (str, optional) – Either ‘const’, ‘random’, or ‘none’. Defaults to ‘const’.

  • class_idx (int, optional) – Class index to generate, for class- conditional networks. Defaults to None.

  • save_projection (bool, optional) – Save weight projection for each generated image as an .npz file in the out directory. Defaults to False.

  • resize (bool, optional) – Crop/resize images to a target micron/pixel size. Defaults to False.

  • gan_um (int, optional) – Size of GAN images in microns. Used for cropping/resizing images to a target size. Defaults to None.

  • gan_px (int, optional) – Size of GAN images in pixels. Used for cropping/resizing images to a target size. Defaults to None.

  • target_um (int, optional) – Crop/resize GAN images to this micron size. Defaults to None.

  • target_px (int, optional) – Crop/resize GAN images to this pixel size. Defaults to None.

generate_features(self, model: str | None, dataset: Dataset | None = None, *, filters: Dict | None = None, filter_blank: str | List[str] | None = None, min_tiles: int = 0, max_tiles: int = 0, outcomes: List[str] | None = None, **kwargs: Any) DatasetFeatures

Calculate layer activations.

See Layer activations for more information.

Parameters:
  • model (str) – Path to model

  • dataset (slideflow.Dataset, optional) – Dataset from which to generate activations. If not supplied, calculate activations for all tfrecords compatible with the model, optionally using provided filters and filter_blank.

Keyword Arguments:
  • filters (dict, optional) – Dataset filters to use for selecting slides. See slideflow.Dataset.filter() for more information. Defaults to None.

  • filter_blank (list(str) or str, optional) – Skip slides that have blank values in these patient annotation columns. Defaults to None.

  • min_tiles (int, optional) – Only include slides with this minimum number of tiles. Defaults to 0.

  • max_tiles (int, optional) – Only include maximum of this many tiles per slide. Defaults to 0 (all tiles).

  • outcomes (list, optional) – Column header(s) in annotations file. Used for category-level comparisons. Defaults to None.

  • layers (list(str)) – Layers from which to generate activations. Defaults to ‘postconv’.

  • export (str) – Path to CSV file. Save activations in CSV format. Defaults to None.

  • cache (str) – Path to PKL file. Cache activations at this location. Defaults to None.

  • include_preds (bool) – Generate and store logit predictions along with layer activations. Defaults to True.

  • batch_size (int) – Batch size to use when calculating activations. Defaults to 32.

Returns:

slideflow.DatasetFeatures

generate_feature_bags(self, model: str | BaseFeatureExtractor, dataset: Dataset | None = None, outdir: str = 'auto', *, filters: Dict | None = None, filter_blank: str | List[str] | None = None, min_tiles: int = 16, max_tiles: int = 0, force_regenerate: bool = False, batch_size: int = 32, slide_batch_size: int = 16, num_gpus: int = 0, **kwargs: Any) str

Generate tile-level features for slides for use with MIL models.

By default, features are exported to the pt_files folder within the project root directory.

Parameters:
  • model (str) – Path to model from which to generate activations. May provide either this or “pt_files”

  • dataset (slideflow.Dataset, optional) – Dataset from which to generate activations. If not supplied, calculate activations for all tfrecords compatible with the model, optionally using provided filters and filter_blank.

  • outdir (str, optional) – Save exported activations in .pt format. Defaults to ‘auto’ (project directory).

Keyword Arguments:
  • filters (dict, optional) – Dataset filters to use for selecting slides. See slideflow.Dataset.filter() for more information. Defaults to None.

  • filter_blank (list(str) or str, optional) – Skip slides that have blank values in these patient annotation columns. Defaults to None.

  • min_tiles (int, optional) – Only include slides with this minimum number of tiles. Defaults to 16.

  • max_tiles (int, optional) – Only include maximum of this many tiles per slide. Defaults to 0 (all tiles).

  • layers (list) – Which model layer(s) generate activations. If model is a saved model, this defaults to ‘postconv’. Defaults to None.

  • force_regenerate (bool) – Forcibly regenerate activations for all slides even if .pt file exists. Defaults to False.

  • min_tiles – Minimum tiles per slide. Skip slides not meeting this threshold. Defaults to 16.

  • batch_size (int) – Batch size during feature calculation. Defaults to 32.

  • slide_batch_size (int) – Interleave feature calculation across this many slides. Higher values may improve performance but require more memory. Defaults to 16.

  • **kwargs – Additional keyword arguments are passed to slideflow.DatasetFeatures.

Returns:

Path to directory containing exported .pt files

generate_heatmaps(self, model: str | None, *, dataset: Dataset, filters: Dict | None = None, filter_blank: str | List[str] | None = None, min_tiles: int = 0, outdir: str | None = None, resolution: str = 'low', batch_size: int = 32, roi_method: str = 'auto', num_threads: int | None = None, img_format: str = 'auto', skip_completed: bool = False, verbose: bool = True, **kwargs: Any) None

Create predictive heatmap overlays on a set of slides.

By default, heatmaps are saved in the heatmaps/ folder in the project root directory.

Parameters:

model (str) – Path to Tensorflow model.

Keyword Arguments:
  • dataset (slideflow.Dataset, optional) – Dataset from which to generate predictions. If not supplied, will generate predictions for all project tfrecords at the tile_px/tile_um matching the model, optionally using provided filters and filter_blank.

  • filters (dict, optional) – Dataset filters to use for selecting slides. See slideflow.Dataset.filter() for more information. Defaults to None.

  • filter_blank (list(str) or str, optional) – Skip slides that have blank values in these patient annotation columns. Defaults to None.

  • min_tiles (int, optional) – Minimum tiles per slide. Skip slides not meeting this threshold. Defaults to 8.

  • outdir (path, optional) – Directory in which to save heatmap images.

  • resolution (str, optional) – Heatmap resolution. Defaults to ‘low’. “low” uses a stride equal to tile width. “medium” uses a stride equal 1/2 tile width. “high” uses a stride equal to 1/4 tile width.

  • batch_size (int, optional) – Batch size during heatmap calculation. Defaults to 64.

  • roi_method (str) – Either ‘inside’, ‘outside’, ‘auto’, or ‘ignore’. Determines how ROIs are used to extract tiles. If ‘inside’ or ‘outside’, will extract tiles in/out of an ROI, and raise errors.MissingROIError if an ROI is not available. If ‘auto’, will extract tiles inside an ROI if available, and across the whole-slide if no ROI is found. If ‘ignore’, will extract tiles across the whole-slide regardless of whether an ROI is available. Defaults to ‘auto’.

  • num_threads (int, optional) – Number of workers threads for each tile extractor. Defaults to the total number of available CPU threads.

  • img_format (str, optional) – Image format (png, jpg) to use when extracting tiles from slide. Must match the image format the model was trained on. If ‘auto’, will use the format logged in the model params.json.

  • skip_completed (bool, optional) – Skip heatmaps for slides that already have heatmaps in target directory.

  • show_roi (bool) – Show ROI on heatmaps.

  • interpolation (str) – Interpolation strategy for predictions. Defaults to None. Includes all matplotlib imshow interpolation options.

  • logit_cmap – Function or a dict used to create heatmap colormap. If None (default), separate heatmaps are generated for each category, with color representing category prediction. Each image tile will generate a list of preds of length O, If logit_cmap is a function, then the logit predictions will be passed, where O is the number of label categories. and the function is expected to return [R, G, B] values. If the logit_cmap is a dictionary, it should map ‘r’, ‘g’, and ‘b’ to label indices; the prediction for these label categories will be mapped to corresponding colors. Thus, the corresponding color will only reflect predictions of up to three labels. Example (this would map predictions for label 0 to red, 3 to green, etc): {‘r’: 0, ‘g’: 3, ‘b’: 1 }

  • verbose (bool) – Show verbose output. Defaults to True.

  • vmin (float) – Minimimum value to display on heatmap. Defaults to 0.

  • vcenter (float) – Center value for color display on heatmap. Defaults to 0.5.

  • vmax (float) – Maximum value to display on heatmap. Defaults to 1.

generate_mosaic(self, df: DatasetFeatures, dataset: Dataset | None = None, *, filters: Dict | None = None, filter_blank: str | List[str] | None = None, outcomes: str | List[str] | None = None, map_slide: str | None = None, show_prediction: str | int | None = None, predict_on_axes: List[int] | None = None, max_tiles: int = 0, umap_cache: str | None = None, use_float: bool = False, low_memory: bool = False, use_norm: bool = True, umap_kwargs: Dict = {}, **kwargs: Any) Mosaic

Generate a mosaic map.

See Mosaic maps for more information.

Parameters:
  • df (slideflow.DatasetFeatures) – Dataset.

  • dataset (slideflow.Dataset, optional) – Dataset from which to generate mosaic. If not supplied, will generate mosaic for all tfrecords at the tile_px/tile_um matching the supplied model, optionally using filters/filter_blank.

Keyword Arguments:
  • filters (dict, optional) – Dataset filters to use for selecting slides. See slideflow.Dataset.filter() for more information. Defaults to None.

  • filter_blank (list(str) or str, optional) – Skip slides that have blank values in these patient annotation columns. Defaults to None.

  • outcomes (list, optional) – Column name in annotations file from which to read category labels.

  • map_slide (str, optional) – None (default), ‘centroid’ or ‘average’. If provided, will map slides using slide-level calculations, either mapping centroid tiles if ‘centroid’, or calculating node averages across tiles in a slide and mapping slide-level node averages, if ‘average’.

  • show_prediction (int or str, optional) – May be either int or str, corresponding to label category. Predictions for this category will be displayed on the exported UMAP plot.

  • max_tiles (int, optional) – Limits tiles taken from each slide. Defaults to 0.

  • umap_cache (str, optional) – Path to PKL file in which to save/cache UMAP coordinates. Defaults to None.

  • use_float (bool, optional) – Interpret labels as continuous instead of categorical. Defaults to False.

  • umap_kwargs (dict, optional) – Dictionary of keyword arguments to pass to the UMAP function.

  • low_memory (bool, optional) – Limit memory during UMAP calculations. Defaults to False.

  • use_norm (bool, optional) – Display image tiles using the normalizer used during model training (if applicable). Detected from a model’s metadata file (params.json). Defaults to True.

  • figsize (Tuple[int, int], optional) – Figure size. Defaults to (200, 200).

  • num_tiles_x (int) – Specifies the size of the mosaic map grid.

  • expanded (bool) – Deprecated argument.

Returns:

Mosaic object.

Return type:

slideflow.Mosaic

generate_mosaic_from_annotations(self, header_x: str, header_y: str, *, dataset: Dataset, model: str | None = None, outcomes: str | List[str] | None = None, max_tiles: int = 100, use_optimal_tile: bool = False, cache: str | None = None, batch_size: int = 32, **kwargs: Any) Mosaic

Generate a mosaic map with manually supplied x/y coordinates.

Slides are mapped with slide-level annotations, with x-axis determined from header_x, y-axis from header_y. If use_optimal_tile=False and no model is provided, the first image tile in each TFRecord will be displayed. If optimal_tile is True, layer activations for all tiles in each slide are calculated using the provided model, and the tile nearest to centroid is used.

Parameters:
  • header_x (str) – Annotations file header with X-axis coords.

  • header_y (str) – Annotations file header with Y-axis coords.

Keyword Arguments:
  • dataset (slideflow.Dataset) – Dataset object.

  • model (str, optional) – Path to model to use when generating layer activations.

  • None. (Defaults to) – If not provided, mosaic will not be calculated or saved. If provided, saved in project mosaic directory.

  • outcomes (list(str)) – Column name(s) in annotations file from which to read category labels.

  • max_tiles (int, optional) – Limits the number of tiles taken from each slide. Defaults to 0.

  • use_optimal_tile (bool, optional) – Use model to calculate layer activations for all tiles in each slide, and choosing tile nearest centroid for each slide for display.

  • cache (str, optional) – Path to PKL file to cache node activations. Defaults to None.

  • batch_size (int, optional) – Batch size for model. Defaults to 64.

  • figsize (Tuple[int, int], optional) – Figure size. Defaults to (200, 200).

  • num_tiles_x (int) – Specifies the size of the mosaic map grid.

  • expanded (bool) – Deprecated argument.

Returns:

slideflow.Mosaic

generate_tfrecord_heatmap(self, tfrecord: str, tile_px: int, tile_um: int | str, tile_dict: Dict[int, float], outdir: str | None = None) None

Create a tfrecord-based WSI heatmap.

Uses a dictionary of tile values for heatmap display, saving to project root directory.

Parameters:
  • tfrecord (str) – Path to tfrecord

  • tile_dict (dict) – Dictionary mapping tfrecord indices to a tile-level value for display in heatmap format

  • tile_px (int) – Tile width in pixels

  • tile_um (int or str) – Tile width in microns (int) or magnification (str, e.g. “20x”).

  • outdir (str, optional) – Destination path to save heatmap.

Returns:

None

dataset(self, tile_px: int | None = None, tile_um: str | int | None = None, *, verification: str | None = 'both', **kwargs: Any) Dataset

Return a slideflow.Dataset object using project settings.

Parameters:
  • tile_px (int) – Tile size in pixels

  • tile_um (int or str) – Tile size in microns (int) or magnification (str, e.g. “20x”).

Keyword Arguments:
  • filters (dict, optional) – Dataset filters to use for selecting slides. See slideflow.Dataset.filter() for more information. Defaults to None.

  • filter_blank (list(str) or str, optional) – Skip slides that have blank values in these patient annotation columns. Defaults to None.

  • min_tiles (int, optional) – Min tiles a slide must have. Defaults to 0.

  • config (str, optional) – Path to dataset configuration JSON file. Defaults to project default.

  • sources (str, list(str), optional) – Dataset sources to use from configuration. Defaults to project default.

  • verification (str, optional) – ‘tfrecords’, ‘slides’, or ‘both’. If ‘slides’, verify all annotations are mapped to slides. If ‘tfrecords’, check that TFRecords exist and update manifest. Defaults to ‘both’.

predict(self, model: str | None, *, dataset: Dataset, filters: Dict | None = None, filter_blank: str | List[str] | None = None, min_tiles: int = 0, checkpoint: str | None = None, eval_k_fold: int | None = None, splits: str = 'splits.json', max_tiles: int = 0, batch_size: int = 32, format: str = 'csv', input_header: str | List[str] | None = None, mixed_precision: bool = True, allow_tf32: bool = False, load_method: str = 'weights', custom_objects: Dict[str, Any] | None = None, **kwargs: Any) Dict[str, DataFrame]

Generate model predictions on a set of tfrecords.

Parameters:

model (str) – Path to model to evaluate.

Keyword Arguments:
  • dataset (slideflow.Dataset, optional) – Dataset from which to generate predictions. If not supplied, will generate predictions for all project tfrecords at the tile_px/tile_um matching the model, optionally using provided filters and filter_blank.

  • filters (dict, optional) – Dataset filters to use for selecting slides. See slideflow.Dataset.filter() for more information. Defaults to None.

  • filter_blank (list(str) or str, optional) – Skip slides that have blank values in these patient annotation columns. Defaults to None.

  • min_tiles (int, optional) – Min tiles a slide must have to be included. Defaults to 0.

  • checkpoint (str, optional) – Path to cp.ckpt file, if evaluating a saved checkpoint. Defaults to None.

  • eval_k_fold (int, optional) – K-fold iteration number to evaluate. If None, will evaluate all tfrecords irrespective of K-fold. Defaults to None.

  • splits (str, optional) – Filename of JSON file in which to log training/validation splits. Looks for filename in project root directory. Defaults to “splits.json”.

  • max_tiles (int, optional) – Maximum number of tiles from each slide to evaluate. If zero, will include all tiles. Defaults to 0.

  • batch_size (int, optional) – Batch size to use during prediction. Defaults to 32.

  • format (str, optional) – Format in which to save predictions. Either ‘csv’, ‘feather’, or ‘parquet’. Defaults to ‘parquet’.

  • input_header (str, optional) – Annotation column header to use as additional input. Defaults to None.

  • mixed_precision (bool, optional) – Enable mixed precision. Defaults to True.

  • allow_tf32 (bool) – Allow internal use of Tensorfloat-32 format. Defaults to False.

  • load_method (str) – Either ‘full’ or ‘weights’. Method to use when loading a Tensorflow model. If ‘full’, loads the model with tf.keras.models.load_model(). If ‘weights’, will read the params.json configuration file, build the model architecture, and then load weights from the given model with Model.load_weights(). Loading with ‘full’ may improve compatibility across Slideflow versions. Loading with ‘weights’ may improve compatibility across hardware & environments.

  • reduce_method (str, optional) – Reduction method for calculating slide-level and patient-level predictions for categorical outcomes. Options include ‘average’, ‘mean’, ‘proportion’, ‘median’, ‘sum’, ‘min’, ‘max’, or a callable function. ‘average’ and ‘mean’ are synonymous, with both options kept for backwards compatibility. If ‘average’ or ‘mean’, will reduce with average of each logit across tiles. If ‘proportion’, will convert tile predictions into onehot encoding then reduce by averaging these onehot values. For all other values, will reduce with the specified function, applied via the pandas DataFrame.agg() function. Defaults to ‘average’.

  • custom_objects (dict, Optional) – Dictionary mapping names (strings) to custom classes or functions. Defaults to None.

Returns:

Dictionary of predictions dataframes, with the keys ‘tile’, ‘slide’, and ‘patient’.

predict_ensemble(self, model: str, k: int | None = None, epoch: int | None = None, **kwargs) None

Evaluate an ensemble of models on a given set of tfrecords.

Parameters:

model (str) – Path to ensemble model to evaluate.

Keyword Arguments:
  • k (int, optional) – The k-fold number to be considered to run the prediction. By default it sets to the first k-fold present in the ensemble folder.

  • epoch (int, optional) – The epoch number to be considered to run the prediction. By default it sets to the first epoch present in the selected k-fold folder.

  • **kwargs (Any) – All keyword arguments accepted by slideflow.Project.predict()

predict_wsi(self, model: str | None, outdir: str, *, dataset: Dataset, filters: Dict | None = None, filter_blank: str | List[str] | None = None, min_tiles: int = 0, stride_div: int = 1, enable_downsample: bool = True, roi_method: str = 'auto', source: str | None = None, img_format: str = 'auto', randomize_origin: bool = False, **kwargs: Any) None

Generate a map of predictions across a whole-slide image.

Parameters:
  • model (str) – Path to model from which to generate predictions.

  • outdir (str) – Directory for saving WSI predictions in .pkl format.

Keyword Arguments:
  • dataset (slideflow.Dataset, optional) – Dataset from which to generate activations. If not supplied, will calculate activations for all tfrecords at the tile_px/tile_um matching the supplied model.

  • filters (dict, optional) – Dataset filters to use for selecting slides. See slideflow.Dataset.filter() for more information. Defaults to None.

  • filter_blank (list(str) or str, optional) – Skip slides that have blank values in these patient annotation columns. Defaults to None.

  • min_tiles (int, optional) – Min tiles a slide must have to be included. Defaults to 0.

  • stride_div (int, optional) – Stride divisor for extracting tiles. A stride of 1 will extract non-overlapping tiles. A stride_div of 2 will extract overlapping tiles, with a stride equal to 50% of the tile width. Defaults to 1.

  • enable_downsample (bool, optional) – Enable downsampling for slides. This may result in corrupted image tiles if downsampled slide layers are corrupted or incomplete. Defaults to True.

  • roi_method (str) – Either ‘inside’, ‘outside’, ‘auto’, or ‘ignore’. Determines how ROIs are used to extract tiles. If ‘inside’ or ‘outside’, will extract tiles in/out of an ROI, and raise errors.MissingROIError if an ROI is not available. If ‘auto’, will extract tiles inside an ROI if available, and across the whole-slide if no ROI is found. If ‘ignore’, will extract tiles across the whole-slide regardless of whether an ROI is available. Defaults to ‘auto’.

  • source (list, optional) – Name(s) of dataset sources from which to get slides. If None, will use all.

  • img_format (str, optional) – Image format (png, jpg) to use when extracting tiles from slide. Must match the image format the model was trained on. If ‘auto’, will use the format logged in the model params.json.

  • randomize_origin (bool, optional) – Randomize pixel starting position during extraction. Defaults to False.

  • whitespace_fraction (float, optional) – Range 0-1. Defaults to 1. Discard tiles with this fraction of whitespace. If 1, will not perform whitespace filtering.

  • whitespace_threshold (int, optional) – Range 0-255. Defaults to 230. Threshold above which a pixel (RGB average) is whitespace.

  • grayspace_fraction (float, optional) – Range 0-1. Defaults to 0.6. Discard tiles with this fraction of grayspace. If 1, will not perform grayspace filtering.

  • grayspace_threshold (float, optional) – Range 0-1. Defaults to 0.05. Pixels in HSV format with saturation below this are grayspace.

save(self) None

Save current project configuration as settings.json.

Train a model using SMAC3 Bayesian hyperparameter optimization.

See Bayesian optimization for more information.

Note

The hyperparameter optimization is performed with SMAC3 and requires the smac package available from pip.

Parameters:
  • outcomes (str, List[str]) – Outcome label annotation header(s).

  • params (ModelParams) – Model parameters for training.

  • smac_configspace (ConfigurationSpace) – ConfigurationSpace to determine the SMAC optimization.

  • smac_limit (int) – Max number of models to train during optimization. Defaults to 10.

  • smac_metric (str, optional) – Metric to monitor for optimization. May either be a callable function or a str. If a callable function, must accept the epoch results dict and return a float value. If a str, must be a valid metric, such as ‘tile_auc’, ‘patient_auc’, ‘r_squared’, etc. Defaults to ‘tile_auc’.

  • save_checkpoints (bool) – Save model checkpoints. Defaults to False.

  • save_model (bool) – Save each trained model. Defaults to False.

  • save_predictions (bool or str, optional) – Save tile, slide, and patient-level predictions at each evaluation. May be ‘csv’, ‘feather’, or ‘parquet’. If False, will not save predictions. Defaults to False.

Returns:

Configuration: Optimal hyperparameter configuration returned by SMAC4BB.optimize().

pd.DataFrame: History of hyperparameters resulting metrics.

Return type:

Tuple

train(self, outcomes: str | List[str], params: str | ModelParams | List[ModelParams] | Dict[str, ModelParams], *, dataset: Dataset | None = None, exp_label: str | None = None, filters: Dict | None = None, filter_blank: str | List[str] | None = None, input_header: str | List[str] | None = None, min_tiles: int = 0, max_tiles: int = 0, splits: str = 'splits.json', mixed_precision: bool = True, allow_tf32: bool = False, load_method: str = 'weights', balance_headers: str | List[str] | None = None, process_isolate: bool = False, **training_kwargs: Any) Dict

Train model(s).

Models are trained using a given set of parameters, outcomes, and (optionally) slide-level inputs.

See Training for more information.

Examples

Method 1 (hyperparameter sweep from a configuration file):

>>> P.train('outcome', params='sweep.json', ...)

Method 2 (manually specified hyperparameters):

>>> hp = sf.ModelParams(...)
>>> P.train('outcome', params=hp, ...)

Method 3 (list of hyperparameters):

>>> hp = [sf.ModelParams(...), sf.ModelParams(...)]
>>> P.train('outcome', params=hp, ...)

Method 4 (dict of hyperparameters):

>>> hp = {'HP0': sf.ModelParams(...), ...}
>>> P.train('outcome', params=hp, ...)
Parameters:
  • outcomes (str or list(str)) – Outcome label annotation header(s).

  • params (slideflow.ModelParams, list, dict, or str) – Model parameters for training. May provide one ModelParams, a list, or dict mapping model names to params. If multiple params are provided, will train models for each. If JSON file is provided, will interpret as a hyperparameter sweep. See examples below for use.

Keyword Arguments:
  • exp_label (str, optional) – Experiment label to add model names.

  • filters (dict, optional) – Dataset filters to use for selecting slides. See slideflow.Dataset.filter() for more information. Defaults to None.

  • filter_blank (list(str) or str, optional) – Skip slides that have blank values in these patient annotation columns. Defaults to None.

  • input_header (list, optional) – List of annotation column headers to use as additional slide-level model input. Defaults to None.

  • min_tiles (int) – Minimum number of tiles a slide must have to include in training. Defaults to 0.

  • max_tiles (int) – Only use up to this many tiles from each slide for training. Defaults to 0 (include all tiles).

  • splits (str, optional) – Filename of JSON file in which to log train/val splits. Looks for filename in project root directory. Defaults to “splits.json”.

  • mixed_precision (bool, optional) – Enable mixed precision. Defaults to True.

  • allow_tf32 (bool) – Allow internal use of Tensorfloat-32 format. Defaults to False.

  • load_method (str) – Either ‘full’ or ‘weights’. Method to use when loading a Tensorflow model. If ‘full’, loads the model with tf.keras.models.load_model(). If ‘weights’, will read the params.json configuration file, build the model architecture, and then load weights from the given model with Model.load_weights(). Loading with ‘full’ may improve compatibility across Slideflow versions. Loading with ‘weights’ may improve compatibility across hardware & environments.

  • balance_headers (str or list(str)) – Annotation header(s) specifying labels on which to perform mini-batch balancing. If performing category-level balancing and this is set to None, will default to balancing on outcomes. Defaults to None.

  • val_strategy (str) – Validation dataset selection strategy. Options include bootstrap, k-fold, k-fold-manual, k-fold-preserved-site, fixed, and none. Defaults to ‘k-fold’.

  • val_k_fold (int) – Total number of K if using K-fold validation. Defaults to 3.

  • val_k (int) – Iteration of K-fold to train, starting at 1. Defaults to None (training all k-folds).

  • val_k_fold_header (str) – Annotations file header column for manually specifying k-fold or for preserved-site cross validation. Only used if validation strategy is ‘k-fold-manual’ or ‘k-fold-preserved-site’. Defaults to None for k-fold-manual and ‘site’ for k-fold-preserved-site.

  • val_fraction (float) – Fraction of dataset to use for validation testing, if strategy is ‘fixed’.

  • val_source (str) – Dataset source to use for validation. Defaults to None (same as training).

  • val_annotations (str) – Path to annotations file for validation dataset. Defaults to None (same as training).

  • val_filters (dict) – Filters to use for validation dataset. See slideflow.Dataset.filter() for more information. Defaults to None (same as training).

  • checkpoint (str, optional) – Path to cp.ckpt from which to load weights. Defaults to None.

  • pretrain (str, optional) – Either ‘imagenet’ or path to Tensorflow model from which to load weights. Defaults to ‘imagenet’.

  • multi_gpu (bool) – Train using multiple GPUs when available. Defaults to False.

  • reduce_method (str, optional) – Reduction method for calculating slide-level and patient-level predictions for categorical outcomes. Options include ‘average’, ‘mean’, ‘proportion’, ‘median’, ‘sum’, ‘min’, ‘max’, or a callable function. ‘average’ and ‘mean’ are synonymous, with both options kept for backwards compatibility. If ‘average’ or ‘mean’, will reduce with average of each logit across tiles. If ‘proportion’, will convert tile predictions into onehot encoding then reduce by averaging these onehot values. For all other values, will reduce with the specified function, applied via the pandas DataFrame.agg() function. Defaults to ‘average’.

  • resume_training (str, optional) – Path to model to continue training. Only valid in Tensorflow backend. Defaults to None.

  • starting_epoch (int) – Start training at the specified epoch. Defaults to 0.

  • steps_per_epoch_override (int) – If provided, will manually set the number of steps in an epoch. Default epoch length is the number of total tiles.

  • save_predictions (bool or str, optional) – Save tile, slide, and patient-level predictions at each evaluation. May be ‘csv’, ‘feather’, or ‘parquet’. If False, will not save predictions. Defaults to ‘parquet’.

  • save_model (bool, optional) – Save models when evaluating at specified epochs. Defaults to True.

  • validate_on_batch (int) – Perform validation every N batches. Defaults to 0 (only at epoch end).

  • validation_batch_size (int) – Validation dataset batch size. Defaults to 32.

  • use_tensorboard (bool) – Add tensorboard callback for realtime training monitoring. Defaults to True.

  • validation_steps (int) – Number of steps of validation to perform each time doing a mid-epoch validation check. Defaults to 200.

Returns:

Dict with model names mapped to train_acc, val_loss, and val_acc

train_ensemble(self, outcomes: str | List[str], params: ModelParams | List[ModelParams] | Dict[str, ModelParams], n_ensembles: int | None = None, **kwargs) List[Dict]

Train an ensemble of model(s).

Trains models using a given set of parameters and outcomes by calling the train function n_ensembles of times.

Parameters:
  • outcomes (str or list(str)) – Outcome label annotation header(s).

  • params (slideflow.ModelParams, list or dict) – Model parameters for training. May provide one ModelParams, a list, or dict mapping model names to params. If multiple params are provided, will train an hyper deep ensemble models for them, otherwise a deep ensemble model.

Keyword Arguments:
  • n_ensembles (int, optional) – Total models needed in the ensemble. Defaults to 5.

  • **kwargs – All keyword arguments accepted by slideflow.Project.train()

Returns:

List of dictionaries of length n_ensembles, containing training results for each member of the ensemble.

train_mil(self, config: mil._TrainerConfig, train_dataset: Dataset, val_dataset: Dataset, outcomes: str | List[str], bags: str | List[str], *, exp_label: str | None = None, **kwargs)

Train a multi-instance learning model.

Parameters:
Keyword Arguments:
  • exp_label (str) – Experiment label, used for naming the subdirectory in the {project root}/mil folder, where training history and the model will be saved.

  • attention_heatmaps (bool) – Calculate and save attention heatmaps on the validation dataset. Defaults to False.

  • interpolation (str, optional) – Interpolation strategy for smoothing attention heatmaps. Defaults to ‘bicubic’.

  • cmap (str, optional) – Matplotlib colormap for heatmap. Can be any valid matplotlib colormap. Defaults to ‘inferno’.

  • norm (str, optional) – Normalization strategy for assigning heatmap values to colors. Either ‘two_slope’, or any other valid value for the norm argument of matplotlib.pyplot.imshow. If ‘two_slope’, normalizes values less than 0 and greater than 0 separately. Defaults to None.

train_simclr(self, simclr_args: simclr.SimCLR_Args, train_dataset: Dataset, val_dataset: Dataset | None = None, *, exp_label: str | None = None, outcomes: str | List[str] | None = None, dataset_kwargs: Dict[str, Any] | None = None, normalizer: str | sf.norm.StainNormalizer | None = None, normalizer_source: str | None = None, **kwargs) None

Train SimCLR model.

Models are saved in simclr folder in the project root directory.

See Self-Supervised Learning (SSL) for more information.

Parameters:
Keyword Arguments:
  • exp_label (str, optional) – Experiment label to add model names.

  • outcomes (str, optional) – Annotation column which specifies the outcome, for optionally training a supervised head. Defaults to None.

  • dataset_kwargs – All other keyword arguments for slideflow.Dataset.tensorflow()

  • **kwargs – All other keyword arguments for slideflow.simclr.run_simclr()

train_clam(self, *args, splits: str = 'splits.json', **kwargs)

Deprecated function.

Train a CLAM model from layer activations exported with slideflow.Project.generate_features_for_clam().

Preferred API is slideflow.Project.train_mil().

See Multiple-Instance Learning (MIL) for more information.

Examples

Train with basic settings:

>>> dataset = P.dataset(tile_px=299, tile_um=302)
>>> P.generate_features_for_clam('/model', outdir='/pt_files')
>>> P.train_clam('NAME', '/pt_files', 'category1', dataset)

Specify a specific layer from which to generate activations:

>>> P.generate_features_for_clam(..., layers=['postconv'])

Manually configure CLAM, with 5-fold validation and SVM bag loss:

>>> import slideflow.clam as clam
>>> clam_args = clam.get_args(k=5, bag_loss='svm')
>>> P.generate_features_for_clam(...)
>>> P.train_clam(..., clam_args=clam_args)
Parameters:
  • exp_name (str) – Name of experiment. Makes clam/{exp_name} folder.

  • pt_files (str) – Path to pt_files containing tile-level features.

  • outcomes (str) – Annotation column which specifies the outcome.

  • dataset (slideflow.Dataset) – Dataset object from which to generate activations.

  • train_slides (str, optional) – List of slide names for training. If ‘auto’ (default), will auto-generate training/val split.

  • validation_slides (str, optional) – List of slides for validation. If ‘auto’ (default), will auto-generate training/val split.

  • splits (str, optional) – Filename of JSON file in which to log training/val splits. Looks for filename in project root directory. Defaults to “splits.json”.

  • clam_args (optional) – Namespace with clam arguments, as provided by slideflow.clam.get_args().

  • attention_heatmaps (bool, optional) – Save attention heatmaps of validation dataset.

Returns:

None