slideflow.io¶
This module contains utility functions for working with TFRecords, cross-compatible with both Tensorflow and PyTorch.
Functions included in this module assist with processing TFRecords, detecting image and data format, extracting tiles, splitting and merging TFrecords, and a variety of other manipulations.
Additional Tensorflow-specific TFRecord reading/writing utility functions are
available in slideflow.io.tensorflow
, and additional PyTorch-specific
functions are in slideflow.io.torch
.
- convert_dtype(img: Any, dtype: dtype | tf.dtypes.DType | torch.dtype) Any [source]¶
Converts an image from one type to another.
Images can be converted to and from numpy arrays, Torch Tensors and Tensorflow Tensors. Images can also be converted from standardized float images to RGB uint8 images, and vice versa.
Supported formats for starting and ending dtype include:
np.uint8
Image in RGB (WHC) uint8 format.
np.float32
RGB (WHC) image. If the source image is a numpy uint8 or torch uint8, it will be standardized with
(img / 127.5) - 1
. If the source image is a tensorflow image, standardization usestf.image.per_image_standardization()
.torch.uint8
Image in RGB (CWH) uint8 format.
torch.float32
Image converted with
(img / 127.5) - 1
and WHC -> CWH.tf.uint8
Image in RGB (WHC) uint8 format.
tf.float32
Image converted with
tf.image.per_image_standardization()
- detect_tfrecord_format(tfr: str) Tuple[List[str] | None, str | None] [source]¶
Detects tfrecord format.
- Parameters:
tfr (str) – Path to tfrecord.
- Returns:
A tuple containing
list(str): List of detected features.
str: Image file type (png/jpeg)
- extract_tiles(tfrecord: str, destination: str) None [source]¶
Extracts images within a TFRecord to a destination folder.
- get_locations_from_tfrecord(filename: str) List[Tuple[int, int]] [source]¶
Return list of tile locations (X, Y) for all items in the TFRecord.
- get_tfrecord_by_index(tfrecord: str, index: int, *, compression_type: str | None = None, index_array: ndarray | None = None) Dict [source]¶
Read a specific record in a TFRecord file.
- Parameters:
- Returns:
A dictionary mapping record names (e.g.,
'slide'
,'image_raw'
,'loc_x'
, and'loc_y'
) to their values.'slide'
will be a string,image_raw
will be bytes, and'loc_x'
and'loc_y'
will be int.- Raises:
slideflow.error.EmptyTFRecordsError – If the file is empty.
slideflow.error.InvalidTFRecordIndex – If the given index cannot be found.
- get_tfrecord_by_location(tfrecord: str, location: Tuple[int, int], decode: bool = True, *, locations_array: List[Tuple[int, int]] | None = None, index_array: ndarray | None = None) Any [source]¶
Reads and returns an individual record from a tfrecord by index, including slide name and processed image data.
- Parameters:
- Returns:
Unprocessed raw TFRecord bytes if
decode=False
, otherwise a tuple containing(slide, image)
, whereimage
is a uint8 Tensor.
- get_tfrecord_parser(tfrecord_path: str, features_to_return: Iterable[str] = None, decode_images: bool = True, standardize: bool = False, normalizer: StainNormalizer | None = None, augment: bool = False, **kwargs) Callable [source]¶
Gets tfrecord parser using dareblopy reader. Torch implementation; different than sf.io.tensorflow
- Parameters:
tfrecord_path (str) – Path to tfrecord to parse.
features_to_return (list or dict, optional) – Designates format for how features should be returned from parser. If a list of feature names is provided, the parsing function will return tfrecord features as a list in the order provided. If a dictionary of labels (keys) mapping to feature names (values) is provided, features will be returned from the parser as a dictionary matching the same format. If None, will return all features as a list.
decode_images (bool, optional) – Decode raw image strings into image arrays. Defaults to True.
standardize (bool, optional) – Standardize images into the range (0,1). Defaults to False.
normalizer (
slideflow.norm.StainNormalizer
) – Stain normalizer to use on images. Defaults to None.Image augmentations to perform. Augmentations include:
'x'
: Random horizontal flip'y'
: Random vertical flip'r'
: Random 90-degree rotation'j'
: Random JPEG compression (50% chance to compress with quality between 50-100)'b'
: Random Gaussian blur (10% chance to blur with sigma between 0.5-2.0)
Combine letters to define augmentations, such as
'xyrjn'
. A value of True will use'xyrjb'
. Note: this function does not support stain augmentation.
- Returns:
A tuple containing
func: Parsing function
dict: Detected feature description for the tfrecord
- get_tfrecord_length(tfrecord: str) int [source]¶
Return the number of records in a TFRecord file.
Uses an index file if available, otherwise iterates through the file to find the total record length.
- read_and_return_record(record: bytes, parser: Callable, assign_slide: str | None = None) Dict [source]¶
Process raw TFRecord bytes into a format that can be written with
tf.io.TFRecordWriter
.- Parameters:
- Returns:
Dictionary mapping record key to a tuple containing (bytes, dtype).
- serialized_record(slide: bytes, image_raw: bytes, loc_x: int = 0, loc_y: int = 0)[source]¶
Returns a serialized example for TFRecord storage, ready to be written by a TFRecordWriter.
- tfrecord_has_locations(filename: str, check_x: int = True, check_y: bool = False) bool [source]¶
Check if a given TFRecord has location information stored.
- update_manifest_at_dir(directory: str, force_update: bool = False) str | Dict | None [source]¶
Log number of tiles in each TFRecord file present in the given directory and all subdirectories, saving manifest to file within the parent directory.
- write_tfrecords_multi(input_directory: str, output_directory: str) None [source]¶
Write multiple tfrecords, one for each slide, from a directory of images.
Scans a folder for subfolders, assumes subfolders are slide names. Assembles all image tiles within subfolders, assuming the subfolder is the slide name. Collects all image tiles and exports into multiple tfrecord files, one for each slide.
- write_tfrecords_single(input_directory: str, output_directory: str, filename: str, slide: str) int [source]¶
Scans a folder for image tiles, annotates using the provided slide, exports into a single tfrecord file.
- write_tfrecords_merge(input_directory: str, output_directory: str, filename: str) int [source]¶
Scans a folder for subfolders, assumes subfolders are slide names. Assembles all image tiles within subfolders and labels using the provided annotation_dict, assuming the subfolder is the slide name. Collects all image tiles and exports into a single tfrecord file.
slideflow.io.preservedsite¶
- generate_crossfolds(*args, method='auto', **kwargs)¶
Generates site preserved cross-folds, balanced on a given category.
Preserved-site cross-validation is performed as described in the manuscript https://doi.org/10.1038/s41467-021-24698-1.
Available solvers include Bonmin and CPLEX. The solver can be manually set with
method
. If not provided, the solver will default to CPLEX if available, and Bonmin as a fallback.CPLEX is properitary software by IBM.
Bonmin can be installed with:
conda install -c conda-forge coinbonmin
- Parameters:
data (pandas.DataFrame) – Dataframe with slides that must be split into crossfolds.
category (str) – The column in data to stratify by.
k (int) – Number of crossfolds for splitting. Defaults to 3.
target_column (str) – Name for target column to contain the assigned crossfolds for each patient in the output dataframe.
timelimit – maximum time to spend solving
- Returns:
dataframe with a new column, ‘CV3’ that contains values 1 - 3, indicating the assigned crossfold