Shortcuts

Layer Activations

Investigating the latent space of a neural network can provide useful insights into the structure of your data and what models have learned during training. Slideflow provides several tools for post-hoc latent space analysis of trained neural networks, primarily by calculating activations at one or more neural network layers for all images in a dataset. In the next sections, we will take a look at how these layer activations can be calculated for downstream analysis and provide examples of analyses that can be performed.

Calculating Layer Activations

Activations at one or more layers of a trained network can be calculated with slideflow.model.Features and slideflow.DatasetFeatures. The former provides an interface for calculating layer activations for a batch of images, and the latter supervises calculations across an entire dataset.

Batch of images

Features provides an interface for calculating layer activations and predictions on a batch of images. The following arguments are available:

  • path: Path to model, from which layer activations are calculated. Required.

  • layers: Layer(s) at which to calculate activations.

  • include_preds: Also return the final network output (predictions)

  • pooling: Apply pooling to layer activations, to reduce dimensionality to one dimension.

If layers is not supplied, activations at the post-convolutional layer will be calculated by default.

Once initialized, the resulting object can be called on a batch of images and will return the layer activations for all images in the batch. For example, to calculate activations at the sep_conv_3 layer of a model while looping through a dataset:

import slideflow as sf

sepconv3 = sf.model.Features('model/path', layer='sep_conv_3')
for img_batch in dataset:
    postconv_activations = sepconv3(img_batch)

If layer is a list of layer names, activations at each layer will be calculated and concatenated. If include_preds is True, the interface will also return the final predictions:

sepconv3_and_preds = sf.model.Features(..., include_preds=True)
layer_activations, preds = sepconv3_and_preds(img_batch)

Note

Features assumes that image batches already have any necessary preprocessing already applied, including standardization and stain normalization.

See the API documentation for Features for more information.

Single slide

Layer activations can also be calculated across an entire slide using the same Features interface. Calling the object on a slideflow.WSI object will generate a grid of activations of size (slide.grid.shape[0], slide.grid.shape[1], num_features):

import slideflow as sf

slide = sf.WSI(...)
postconv = sf.model.Features('/model/path', layers='postconv')
feature_grid = postconv(slide)
print(feature_grid.shape)
(50, 45, 2048)

Entire dataset

Finally, layer activations can also be calculated for an entire dataset using slideflow.DatasetFeatures. Instancing the class supervises the calculation and caching of layer activations, which can then be used for downstream analysis. The project function slideflow.Project.generate_features() creates and returns an instance of this class.

dts_ftrs = P.generate_features('/path/to/trained_model')

Alternatively, you can create an instance of this class directly:

import slideflow as sf

dataset = P.dataset(tile_px=299, tile_um=302)
dts_ftrs = sf.DatasetFeatures(
  model='/path/to/trained_model',
  dataset=dataset,
)

Tile-level feature activations for each slide can be accessed directly from DatasetFeatures.activations, a dict mapping slide names to numpy arrays of shape (num_tiles, num_features). Predictions are stored in DatasetFeatures.predictions, a dict mapping slide names to numpy arrays of shape (num_tiles, num_classes). Tile-level location data (coordinates from which the tiles were taken from their respective source slides) is stored in DatasetFeatures.locations, a dict mapping slide names to numpy arrays of shape (num_tiles, 2) (x, y).

Activations can be exported to a Pandas DataFrame with slideflow.DatasetFeatures.to_df() or exported into PyTorch format with slideflow.DatasetFeatures.to_torch(). See Generating Features for more information about generating and exporting features for MIL models.

Read the API documentation for slideflow.DatasetFeatures for more information.

Mapping Activations

Layer activations across a dataset can be dimensionality reduced with UMAP and plotted for visualization using slideflow.DatasetFeatures.map_activations(). This function returns an instance of slideflow.SlideMap, a class that provides easy access to labeling and plotting.

The below example calculates layer activations at the neural network layer sep_conv_3 for an entire dataset, and then reduces the activations into two dimensions for easy visualization using UMAP. Any valid UMAP parameters can be passed via keyword argument.

dts_ftrs = P.generate_features(
  model='/path/to/trained_model',
  layers='sep_conv_3'
)
slide_map = dts_ftrs.map_activations(
    n_neighbors=10, # UMAP parameter
    min_dist=0.2    # UMAP parameter
)

We can then plot the activations with slideflow.SlideMap.plot(). All keyword arguments are passed to the matplotlib scatter function.

import matplotlib.pyplot as plt

slide_map.plot(s=10)
plt.show()

We can add labels to our plot by first passing a dictionary with slide labels to the function slideflow.SlideMap.label_by_slide().

# Get a dictionary mapping slide names to category labels
dataset = P.dataset(tile_px=299, tile_um='10x')
labels, unique_labels = dataset.labels('subtype', format='name')

# Assign the labels to the slide map, then plot
slide_map.label_by_slide(labels)
slide_map.plot()
../_images/umap_example.png

Finally, we can use SlideMap.umap_transform() to project new data into two dimensions using the previously fit UMAP.

import slideflow as sf
import numpy as np

# Create a SlideMap using layer activations reduced with UMAP
dts_ftrs = P.generate_features(
  model='/path/to/trained_model',
  layers='sep_conv_3'
)
slide_map = dts_ftrs.map_activations()

# Load some dummy data.
# Second dimension must match size of activation vector.
dummy = np.random.random((100, 1024))

# Transform the data using the already-fit UMAP.
transformed = slide_map.umap_transform(dummy)
print(transformed.shape)
(100, 2)

Read more about additional slideflow.SlideMap functions, including saving, loading, and clustering, in the linked API documentation.

Mosaic Maps

Mosaic maps provide a tool for visualizing the distribution of histologic image features in a dataset through analysis of neural network layer activations. Similar to activation atlases, a mosaic map is generated by first calculating layer activations for a dataset, dimensionality reducing these activations with UMAP, and then overlaying corresponding images in a grid-wise fashion.

../_images/mosaic_example.png

In the previous sections, we reviewed how to calculate layer activations across a dataset, and then dimensionality reduce these activations into two dimensions using UMAP. slideflow.Mosaic provides a tool for converting these activation maps into a grid of image tiles plotted according to their associated activation vectors.

Quickstart

The fastest way to build a mosaic map is using slideflow.Project.generate_mosaic, which requires a DatasetFeatures object as its only mandatory argument and returns an instance of slideflow.Mosaic.

dts_ftrs = P.generate_features('/path/to/trained_model', layers='postconv')
mosaic = P.generate_mosaic(dts_ftrs)
mosaic.save('mosaic.png')

When created with this interface, the underlying slideflow.SlideMap object used to create the mosaic map is accessible via slideflow.Mosaic.slide_map. You could, for example, use slideflow.SlideMap.save() to save the UMAP plot:

mosiac.slide_map.save('umap.png')

From a SlideMap

Any SlideMap can be converted to a mosaic map with slideflow.SlideMap.generate_mosaic().

ftrs = P.generate_features('/path/to/model')
slide_map = ftrs.map_activations()
mosaic = slide_map.generate_mosaic()
mosaic.save('mosaic.png')

Manual creation

Mosaic maps can be flexibly created with slideflow.Mosaic, requiring two components: a set of images and corresponding coordinates. Images and coordinates can either be manually provided, or the mosaic can dynamically read images from TFRecords (as is done with Project.generate_mosaic()).

The first argument of slideflow.Mosaic provides the images, and may be either of the following:

  • A list or array of images (np.ndarray, HxWxC)

  • A list of tuples, containing (slide_name, tfrecord_index)

The second argument provides the coordinates:

  • A list or array of (x, y) coordinates for each image

For example, to create a mosaic map from a list of images and coordinates:

# Example data (images are HxWxC, np.ndarray)
images = [np.ndarray(...), ...]
coords = [(0.2, 0.9), ...]

# Generate the mosaic
mosaic = Mosaic(images, coordinates)
mosaic.plot()

You can also generate a mosaic map where the images are tuples of (tfrecord, tfrecord_index). In this case, the mosaic map will dynamically read images from TFRecords during plotting.

# Example data
tfrecords = ['/path/to/tfrecord`.tfrecords', ...]
idx = [253, 112, ...]
coords = [(0.2, 0.9), ...]

# Generate mosaic map
mosaic = sf.Mosaic(
    images=[(tfr, idx) for tfr, idx in zip(tfrecords, idx)],
    coords=coords
)

There are several additional arguments that can be used to customize the mosaic map plotting. Read the linked API documentation for slideflow.Mosaic for more information.