Custom Feature Extractors¶

Slideflow includes several pretrained feature extractors for converting image tiles into feature vectors as well as tools to assist with building your own feature extractor. In this note, we’ll walk through the process of building a custom feature extractor from both a PyTorch and Tensorflow model.

PyTorch¶

Feature extractors are implemented as a subclass of slideflow.model.extractors._factory_torch.TorchFeatureExtractor. The base class provides core functionality and helper methods for generating features from image tiles (dtype uint8) or whole-slide images (type slideflow.WSI).

The initializer should create the feature extraction model and move it to the appropriate device (i.e. GPU). The model should be a torch.nn.Module that accepts an image tensor as input and returns a feature tensor as output.

# Import your custom torch.nn.Module,
# which generates features from an image.
from my_module import MyModel

from slideflow.model.extractors._factory_torch import TorchFeatureExtractor

class MyFeatureExtractor(TorchFeatureExtractor):

    tag = 'my_feature_extractor'  # Human-readable identifier

    def __init__(self):
        super().__init__()

        # Create the device, move to GPU, and set in evaluation mode.
        self.model = MyModel()
        self.model.to('cuda')
        self.model.eval()

Next, the initializer should set the number of features expected to be returned by the model.

...

    def __init__(self):
        ...

        self.num_features = 1024

The initializer is also responsible for registering image preprocessing. The image preprocessing transformation, a function which converts a raw uint8 image to a float32 tensor for model input, should be stored in self.transform. If the transformation standardizes the images, then the parameter self.preprocess_kwargs should be set to {'standardize': False}, indicating that Slideflow should not perform any additional standardization. You can use the class method .build_transform() to use the standard preprocessing pipeline.

from torchvision import transforms

...

    def __init__(self):
        ...

        # Image preprocessing.
        self.transform = self.build_transform(img_size=256)
        # Disable Slideflow standardization,
        # as we are standardizing with transforms.Normalize
        self.preprocess_kwargs = {'standardize': False}

The final required method is .dump_config(), which returns a dictionary of configuration parameters needed to regenerate this class. It should return a dictionary with "class" and "kwargs" attributes. This configuration is saved to a JSON configuration file when generating bags for MIL training.

...

    def dump_config(self):
        return self._dump_config(
            class_name='my_module.MyFeatureExtractor'
        )

The final class should look like this:

from my_module import MyModel
from slideflow.model.extractors._factory_torch import TorchFeatureExtractor
from torchvision import transforms

class MyFeatureExtractor(TorchFeatureExtractor):

    tag = 'my_feature_extractor'  # Human-readable identifier

    def __init__(self):
        super().__init__()

        # Create the device, move to GPU, and set in evaluation mode.
        self.model = MyModel()
        self.model.to('cuda')
        self.model.eval()
        self.num_features = 1024

        # Image preprocessing.
        self.transform = self.build_transform(img_size=256)
        # Disable Slideflow standardization,
        # as we are standardizing with transforms.Normalize
        self.preprocess_kwargs = {'standardize': False}

    def dump_config(self):
        return self._dump_config(
            class_name='my_module.MyFeatureExtractor'
        )

You can then use the feature extractor for generating bags for MIL training, as described in Multiple-Instance Learning (MIL).

# Build the feature extractor.
myfeatures = MyFeatureExtractor()

# Load a dataset.
project = slideflow.load_project(...)
dataset = project.dataset(...)

# Generate bags.
project.generate_feature_bags(myfeatures, dataset)

You can also generate features across whole-slide images, returning a grid of features for each slide. The size of the returned grid reflects the slide’s tile grid. For example, for a slide with 24 columns and 33 rows of tiles, the returned grid will have shape (24, 33, n_features).

>>> myfeatures = MyFeatureExtractor()
>>> wsi = sf.WSI('path/to/wsi', tile_px=256, tile_um=302)
>>> features = myfeatures(wsi)
>>> features.shape
(24, 33, 1024)

Finally, the feature extractor can also be used to perform latent space analysis and generate mosaic maps, as described in Layer Activations.

Slideflow includes a registration system for keeping track of all available feature extractors. To register your feature extractor, use the slideflow.model.extractors.register_torch() decorator.

from slideflow.model.extractors import register_torch

@register_torch
def my_feature_extractor(**kwargs):
    return MyFeatureExtractor(**kwargs)

Once registered, a feature extractor can be built by name:

import slideflow as sf
extractor = sf.build_feature_extractor('my_feature_extractor')

Tensorflow¶

Tensorflow feature extractors are implemented very similarly to PyTorch feature extractors, extended from slideflow.model.extractors._tensorflow_base.TensorflowFeatureExtractor.

The initializer should create the model and set the expected number of features.

from my_module import MyModel
from slideflow.model.extractors._tensorflow_base import TensorflowFeatureExtractor

class MyFeatureExtractor(TensorflowFeatureExtractor):

    tag = 'my_feature_extractor'  # Unique identifier

    def __init__(self):
        super().__init__()

        # Create the model.
        self.model = MyModel()
        self.num_features = 1024

The initializer is also responsible for registering image preprocessing and transformations. Preprocessing steps are stored in the .preprocess_kwargs dictionary, which should have the keys standardize and transform. If standardize=True, images will be standardized using tf.image.per_image_standardization. If transform is not None, it should be a callable that accepts a single image tensor and returns a transformed image tensor.

For example, to only perform standardization and no further preprocessing:

...

    def __init__(self):
        ...

        # Image preprocessing.
        self.preprocess_kwargs = {
            'standardize': True,
            'transform': None
        }

To perform standardization and resize images to 256x256:

import tensorflow as tf

@tf.function
def resize_256(x):
    return = tf.image.resize(x, (resize_px, resize_px))

...

    def __init__(self):
        ...

        # Image preprocessing.
        self.preprocess_kwargs = {
            'standardize': True,
            'transform': resize_256
        }

The .dump_config() method should then be set, which is expected to return a dictionary of configuration parameters needed to regenerate this class. It should return a dictionary with "class" and "kwargs" attributes. This configuration is saved to a JSON configuration file when generating bags for MIL training.

...

    def dump_config(self):
        return {
            'class': 'MyFeatureExtractor',
            'kwargs': {}
        }

The final class should look like this:

from my_module import MyModel
from slideflow.model.extractors._tensorflow_base import TensorflowFeatureExtractor

class MyFeatureExtractor(TensorflowFeatureExtractor):

    tag = 'my_feature_extractor'  # Unique identifier

    def __init__(self):
        super().__init__()

        # Create the model.
        self.model = MyModel()
        self.num_features = 1024

        # Image preprocessing.
        self.preprocess_kwargs = {
            'standardize': True,
            'transform': None
        }

    def dump_config(self):
        return {
            'class': 'MyFeatureExtractor',
            'kwargs': {}
        }

As described above, this feature extractor can then be used to create bags for MIL training, generate features across whole-slide images, or perform feature space analysis across a dataset.

To register your feature extractor, use the slideflow.model.extractors.register_tensorflow() decorator.

from slideflow.model.extractors import register_tf

@register_tf
def my_feature_extractor(**kwargs):
    return MyFeatureExtractor(**kwargs)

…which will allow the feature extractor to be built by name:

import slideflow as sf
extractor = sf.build_feature_extractor('my_feature_extractor')