Investigating the latent space of a neural network can provide useful insights into the structure of your data and what models have learned during training. Slideflow provides several tools for post-hoc latent space analysis of trained neural networks, primarily by calculating activations at one or more neural network layers for all images in a dataset. In the next sections, we will take a look at how these layer activations can be calculated for downstream analysis and provide examples of analyses that can be performed.
Calculating Layer Activations¶
Activations at one or more layers of a trained network can be calculated with
slideflow.DatasetFeatures. The former provides an interface for calculating layer activations for a batch of images, and the latter supervises calculations across an entire dataset.
Batch of images¶
Features provides an interface for calculating layer activations and predictions on a batch of images. The following arguments are available:
path: Path to model, from which layer activations are calculated. Required.
layers: Layer(s) at which to calculate activations.
include_preds: Also return the final network output (predictions)
pooling: Apply pooling to layer activations, to reduce dimensionality to one dimension.
layers is not supplied, activations at the post-convolutional layer will be calculated by default.
Once initialized, the resulting object can be called on a batch of images and will return the layer activations for all images in the batch. For example, to calculate activations at the
sep_conv_3 layer of a model while looping through a dataset:
import slideflow as sf sepconv3 = sf.model.Features('model/path', layer='sep_conv_3') for img_batch in dataset: postconv_activations = sepconv3(img_batch)
layer is a list of layer names, activations at each layer will be calculated and concatenated. If
True, the interface will also return the final predictions:
sepconv3_and_preds = sf.model.Features(..., include_preds=True) layer_activations, preds = sepconv3_and_preds(img_batch)
Features assumes that image batches already have any necessary preprocessing already applied, including standardization and stain normalization.
See the API documentation for
Features for more information.
Layer activations can also be calculated across an entire slide using the same
Features interface. Calling the object on a
slideflow.WSI object will generate a grid of activations of size
(slide.grid.shape, slide.grid.shape, num_features):
import slideflow as sf slide = sf.WSI(...) postconv = sf.model.Features('/model/path', layers='postconv') feature_grid = postconv(slide) print(feature_grid.shape)
(50, 45, 2048)
Finally, layer activations can also be calculated for an entire dataset using
slideflow.DatasetFeatures. Instancing the class supervises the calculation and caching of layer activations, which can then be used for downstream analysis. The project function
slideflow.Project.generate_features() creates and returns an instance of this class.
dts_ftrs = P.generate_features('/path/to/trained_model')
Alternatively, you can create an instance of this class directly:
import slideflow as sf dataset = P.dataset(tile_px=299, tile_um=302) dts_ftrs = sf.DatasetFeatures( model='/path/to/trained_model', dataset=dataset, )
Tile-level feature activations for each slide can be accessed directly from
DatasetFeatures.activations, a dict mapping slide names to numpy arrays of shape
(num_tiles, num_features). Predictions are stored in
DatasetFeatures.predictions, a dict mapping slide names to numpy arrays of shape
(num_tiles, num_classes). Tile-level location data (coordinates from which the tiles were taken from their respective source slides) is stored in
DatasetFeatures.locations, a dict mapping slide names to numpy arrays of shape
(num_tiles, 2) (
Activations can be exported to a Pandas DataFrame with
slideflow.DatasetFeatures.to_df() or exported into PyTorch format with
slideflow.DatasetFeatures.to_torch(). See Multiple-Instance Learning (MIL) for more information about generating and exporting features for MIL models.
Read the API documentation for
slideflow.DatasetFeatures for more information.
Layer activations across a dataset can be dimensionality reduced with UMAP and plotted for visualization using
slideflow.DatasetFeatures.map_activations(). This function returns an instance of
slideflow.SlideMap, a class that provides easy access to labeling and plotting.
The below example calculates layer activations at the neural network layer
sep_conv_3 for an entire dataset, and then reduces the activations into two dimensions for easy visualization using UMAP. Any valid UMAP parameters can be passed via keyword argument.
dts_ftrs = P.generate_features( model='/path/to/trained_model', layers='sep_conv_3' ) slide_map = dts_ftrs.map_activations( n_neighbors=10, # UMAP parameter min_dist=0.2 # UMAP parameter )
import matplotlib.pyplot as plt slide_map.plot(s=10) plt.show()
We can add labels to our plot by first passing a dictionary with slide labels to the function
# Get a dictionary mapping slide names to category labels dataset = P.dataset(tile_px=299, tile_um='10x') labels, unique_labels = dataset.labels('subtype', format='name') # Assign the labels to the slide map, then plot slide_map.label_by_slide(labels) slide_map.plot()
Finally, we can use
SlideMap.umap_transform() to project new data into two dimensions using the previously fit UMAP.
import slideflow as sf import numpy as np # Create a SlideMap using layer activations reduced with UMAP dts_ftrs = P.generate_features( model='/path/to/trained_model', layers='sep_conv_3' ) slide_map = dts_ftrs.map_activations() # Load some dummy data. # Second dimension must match size of activation vector. dummy = np.random.random((100, 1024)) # Transform the data using the already-fit UMAP. transformed = slide_map.umap_transform(dummy) print(transformed.shape)
Read more about additional
slideflow.SlideMap functions, including saving, loading, and clustering, in the linked API documentation.
Mosaic maps provide a tool for visualizing the distribution of histologic image features in a dataset through analysis of neural network layer activations. Similar to activation atlases, a mosaic map is generated by first calculating layer activations for a dataset, dimensionality reducing these activations with UMAP, and then overlaying corresponding images in a grid-wise fashion.
In the previous sections, we reviewed how to calculate layer activations across a dataset, and then dimensionality reduce these activations into two dimensions using UMAP.
slideflow.Mosaic provides a tool for converting these activation maps into a grid of image tiles plotted according to their associated activation vectors.
dts_ftrs = P.generate_features('/path/to/trained_model', layers='postconv') mosaic = P.generate_mosaic(dts_ftrs) mosaic.save('mosaic.png')
When created with this interface, the underlying
slideflow.SlideMap object used to create the mosaic map is accessible via
slideflow.Mosaic.slide_map. You could, for example, use
slideflow.SlideMap.save() to save the UMAP plot:
From a SlideMap¶
SlideMap can be converted to a mosaic map with
ftrs = P.generate_features('/path/to/model') slide_map = ftrs.map_activations() mosaic = slide_map.generate_mosaic() mosaic.save('mosaic.png')
Mosaic maps can be flexibly created with
slideflow.Mosaic, requiring two components: a set of images and corresponding coordinates. Images and coordinates can either be manually provided, or the mosaic can dynamically read images from TFRecords (as is done with
The first argument of
slideflow.Mosaic provides the images, and may be either of the following:
A list or array of images (np.ndarray, HxWxC)
A list of tuples, containing
The second argument provides the coordinates:
A list or array of (x, y) coordinates for each image
For example, to create a mosaic map from a list of images and coordinates:
# Example data (images are HxWxC, np.ndarray) images = [np.ndarray(...), ...] coords = [(0.2, 0.9), ...] # Generate the mosaic mosaic = Mosaic(images, coordinates) mosaic.plot()
You can also generate a mosaic map where the images are tuples of (tfrecord, tfrecord_index). In this case, the mosaic map will dynamically read images from TFRecords during plotting.
# Example data tfrecords = ['/path/to/tfrecord`.tfrecords', ...] idx = [253, 112, ...] coords = [(0.2, 0.9), ...] # Generate mosaic map mosaic = sf.Mosaic( images=[(tfr, idx) for tfr, idx in zip(tfrecords, idx)], coords=coords )
There are several additional arguments that can be used to customize the mosaic map plotting. Read the linked API documentation for
slideflow.Mosaic for more information.