slideflow.util¶

This module contains a variety of utility functions used throughout the package.

class EasyDict[source]¶: Convenience class that behaves like a dict but allows access with the attribute syntax.

class FeatureExtractionProgress(*columns: str | ProgressColumn, console: Console | None = None, auto_refresh: bool = True, refresh_per_second: float = 10, speed_estimate_period: float = 30.0, transient: bool = False, redirect_stdout: bool = True, redirect_stderr: bool = True, get_time: Callable[[], float] | None = None, disable: bool = False, expand: bool = False)[source]¶

get_renderables()[source]¶: Get a number of renderables for the progress display.

class ImgBatchSpeedColumn(batch_size=1, *args, **kwargs)[source]¶

Renders human readable transfer speed.

__init__(batch_size=1, *args, **kwargs)[source]¶

render(task: Task) → Text[source]¶: Show data transfer speed.

class LabeledMofNCompleteColumn(unit: str, *args, **kwargs)[source]¶

Renders a completion column with labels.

__init__(unit: str, *args, **kwargs)[source]¶

render(task: Task) → Text[source]¶: Show completion status with labels.

class MultiprocessProgress(pb)[source]¶

Wrapper for a rich.progress bar that can be shared across processes.

__init__(pb)[source]¶

class MultiprocessProgressTracker(tasks)[source]¶

Wrapper for a rich.progress tracker that can be shared across processes.

__init__(tasks)[source]¶

class TileExtractionProgress(*columns: str | ProgressColumn, console: Console | None = None, auto_refresh: bool = True, refresh_per_second: float = 10, speed_estimate_period: float = 30.0, transient: bool = False, redirect_stdout: bool = True, redirect_stderr: bool = True, get_time: Callable[[], float] | None = None, disable: bool = False, expand: bool = False)[source]¶

get_renderables()[source]¶: Get a number of renderables for the progress display.

class TileExtractionSpeedColumn(table_column: Column | None = None)[source]¶

Renders human readable transfer speed.

render(task: Task) → Text[source]¶: Show data transfer speed.

class ValidJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶

Constructor for JSONEncoder, with sensible defaults.

If skipkeys is false, then it is a TypeError to attempt encoding of keys that are not str, int, float or None. If skipkeys is True, such items are simply skipped.

If ensure_ascii is true, the output is guaranteed to be str objects with all incoming non-ASCII characters escaped. If ensure_ascii is false, the output can contain non-ASCII characters.

If check_circular is true, then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an RecursionError). Otherwise, no such check takes place.

If allow_nan is true, then NaN, Infinity, and -Infinity will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a ValueError to encode such floats.

If sort_keys is true, then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day-to-day basis.

If indent is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. None is the most compact representation.

If specified, separators should be an (item_separator, key_separator) tuple. The default is (’, ‘, ‘: ‘) if indent is None and (‘,’, ‘: ‘) otherwise. To get the most compact JSON representation, you should specify (‘,’, ‘:’) to eliminate whitespace.

If specified, default is a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a TypeError.

default(obj)[source]¶

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)

about(console=None) → None[source]¶

Print a summary of the slideflow version and active backends.

Example

>>> sf.about()
╭=======================╮
│       Slideflow       │
│    Version: 3.0.0     │
│  Backend: torch       │
│ Slide Backend: cucim  │
│ https://slideflow.dev │
╰=======================╯

Parameters:: console (rich.console.Console, optional) – Active console, if one exists. Defaults to None.

batch(iterable: List, n: int = 1) → Iterable[source]¶: Separates an interable into batches of maximum size n.

batch_generator(iterable: Iterable, n: int = 1) → Iterable[source]¶: Separates an interable into batches of maximum size n.

bin_values_to_slide_grid(locations: ndarray, values: ndarray, wsi: WSI, background: str = 'min') → ndarray[source]¶

Bin heatmap values to a slide grid, using tile location information.

Parameters:

locations (np.ndarray) – Array of shape (n_tiles, 2) containing x, y coordinates for all image tiles. Coordinates represent the center for an associated tile, and must be in a grid.
values (np.ndarray) – Array of shape (n_tiles,) containing heatmap values for each tile.
wsi (slideflow.wsi.WSI) – WSI object.

Keyword Arguments:

background (str, optional) – Background strategy for heatmap. Can be ‘min’, ‘mean’, ‘median’, ‘max’, or ‘mask’. Defaults to ‘min’.

choice_input(prompt, valid_choices, default=None, multi_choice=False, input_type=<class 'str'>)[source]¶: Prompts user for multi-choice input.

create_triangles(vertices, hole_vertices=None, hole_points=None)[source]¶

Tessellate a complex polygon, possibly with holes.

Parameters:

vertices – A list of vertices [(x1, y1), (x2, y2), …] defining the polygon boundary.
holes – An optional list of points [(hx1, hy1), (hx2, hy2), …] inside each hole in the polygon.

Returns:

A numpy array of vertices for the tessellated triangles.

download_from_tcga(uuid: str, dest: str, message: str = 'Downloading...') → None[source]¶: Download a file from TCGA (GDC) by UUID.

getLoggingLevel()[source]¶: Return the current logging level.

get_ensemble_model_config(model_path: str) → Dict[source]¶: Loads ensemble model configuration JSON file.

get_gan_config(model_path: str) → Dict[source]¶: Loads a GAN training_options.json for an associated network PKL.

get_model_config(model_path: str) → Dict[source]¶: Loads model configuration JSON file.

get_model_normalizer(model_path: str) → StainNormalizer | None[source]¶: Loads and fits normalizer using configuration at a model path.

get_preprocess_fn(model_path: str)[source]¶

Returns a function which preprocesses a uint8 image for a model.

Parameters:: model_path (str) – Path to a saved Slideflow model.
Returns:: A function which accepts a single image or batch of uint8 images, and returns preprocessed (and stain normalized) float32 images.

get_relative_tfrecord_paths(root: str, directory: str = '') → List[str][source]¶: Returns relative tfrecord paths with respect to the given directory.

get_slide_paths(slides_dir: str) → List[str][source]¶: Get all slide paths from a given directory containing slides.

get_slides_from_model_manifest(model_path: str, dataset: str | None = None) → List[str][source]¶

Get list of slides from a model manifest.

Parameters:

model_path (str) – Path to model from which to load the model manifest.
dataset (str) – ‘training’ or ‘validation’. Will return only slides from this dataset. Defaults to None (all).

Returns:

List of slide names.

Return type:

list(str)

get_valid_model_dir(root: str) → List[source]¶

This function returns the path of the first indented directory from root. This only works when the indented folder name starts with a 5 digit number, like “00000%”.

Examples

If the root has 3 files: root/00000-foldername/ root/00001-foldername/ root/00002-foldername/

The function returns “root/00000-foldername/”

global_path(root: str, path_string: str)[source]¶: Returns global path from a local path.

infer_stride(locations, wsi)[source]¶

Infer the stride of a grid of locations from a set of locations.

Parameters:

locations (np.ndarray) – Nx2 array of locations
wsi (slideflow.wsi.WSI) – WSI object

Returns:

inferred stride divisor in pixels

Return type:

float

is_model(path: str) → bool[source]¶: Checks if the given path is a valid Slideflow model.

is_project(path: str) → bool[source]¶: Checks if the given path is a valid Slideflow project.

is_simclr_model_path(path: Any) → bool[source]¶: Checks if the given path is a valid SimCLR model or checkpoint.

is_slide(path: str) → bool[source]¶: Checks if the given path is a supported slide.

is_tensorflow_model_path(path: str) → bool[source]¶: Checks if the given path is a valid Slideflow/Tensorflow model.

is_tile_size_compatible(tile_px1: int, tile_um1: str | int, tile_px2: int, tile_um2: str | int) → bool[source]¶

Check whether tile sizes are compatible.

Compatibility is defined as:

Equal size in pixels
If tile width (tile_um) is defined in microns (int) for both, these must be equal
If tile width (tile_um) is defined as a magnification (str) for both, these must be equal
If one is defined in microns and the other as a magnification, the calculated magnification must be +/- 2.

Example 1: - tile_px1=299, tile_um1=302 - tile_px2=299, tile_um2=304 - Incompatible (unequal micron width)

Example 2: - tile_px1=299, tile_um1=10x - tile_px2=299, tile_um2=9x - Incompatible (unequal magnification)

Example 3: - tile_px1=299, tile_um1=302 - tile_px2=299, tile_um2=10x - Compatible (first has an equivalent magnification of 9.9x, which is +/- 2 compared to 10x)

Parameters:

tile_px1 (int) – Tile size (in pixels) of first slide.
tile_um1 (int or str) – Tile size (in microns) of first slide. Can also be expressed as a magnification level, e.g. '10x'
tile_px2 (int) – Tile size (in pixels) of second slide.
tile_um2 (int or str) – Tile size (in microns) of second slide. Can also be expressed as a magnification level, e.g. '10x'

Returns:

Whether the tile sizes are compatible.

Return type:

bool

is_torch_model_path(path: str) → bool[source]¶: Checks if the given path is a valid Slideflow/PyTorch model.

is_uq_model(model_path: str) → bool[source]¶: Checks if the given model path points to a UQ-enabled model.

isnumeric(val: Any) → bool[source]¶

Check if the given value is numeric (numpy or python).

Tensors will return False.

Specifically checks if the value is a python int or float, or if the value is a numpy array with a numeric dtype (int or float).

load_json(filename: str) → Any[source]¶: Reads JSON data from file.

load_predictions(path: str, **kwargs) → DataFrame[source]¶

Loads a ‘csv’, ‘parquet’ or ‘feather’ file to a pandas dataframe.

Parameters:: path (str) – Path to the file to be read.
Returns:: The dataframe read from the path.
Return type:: df (pd.DataFrame)

location_heatmap(locations: ndarray, values: ndarray, slide: str, tile_px: int, tile_um: int | str, filename: str, *, interpolation: str | None = 'bicubic', cmap: str = 'inferno', norm: str | None = None, background: str = 'min') → None[source]¶

Generate a heatmap for a slide.

Parameters:

locations (np.ndarray) – Array of shape (n_tiles, 2) containing x, y coordinates for all image tiles. Coordinates represent the center for an associated tile, and must be in a grid.
values (np.ndarray) – Array of shape (n_tiles,) containing heatmap values for each tile.
slide (str) – Path to corresponding slide.
tile_px (int) – Tile pixel size.
tile_um (int, str) – Tile micron or magnification size.
filename (str) – Destination filename for heatmap.

Keyword Arguments:

interpolation (str, optional) – Interpolation strategy for smoothing heatmap. Defaults to ‘bicubic’.
cmap (str, optional) – Matplotlib colormap for heatmap. Can be any valid matplotlib colormap. Defaults to ‘inferno’.
norm (str, optional) – Normalization strategy for assigning heatmap values to colors. Either ‘two_slope’, or any other valid value for the norm argument of matplotlib.pyplot.imshow. If ‘two_slope’, normalizes values less than 0 and greater than 0 separately. Defaults to None.

log_manifest(train_tfrecords: List[str] | None = None, val_tfrecords: List[str] | None = None, *, labels: Dict[str, Any] | None = None, filename: str | None = None, remove_extension: bool = True) → str[source]¶

Saves the training manifest in CSV format and returns as a string.

Parameters:

train_tfrecords (list(str)], optional) – List of training TFRecords. Defaults to None.
val_tfrecords (list(str)], optional) – List of validation TFRecords. Defaults to None.

Keyword Arguments:

labels (dict, optional) – TFRecord outcome labels. Defaults to None.
filename (str, optional) – Path to CSV file to save. Defaults to None.
remove_extension (bool, optional) – Remove file extension from slide names. Defaults to True.

Returns:

Saved manifest in str format.

Return type:

str

make_dir(_dir: str) → None[source]¶: Makes a directory if one does not already exist, in a manner compatible with multithreading.

map_values_to_slide_grid(locations: ndarray, values: ndarray, wsi: WSI, background: str = 'min', *, interpolation: str | None = 'bicubic') → ndarray[source]¶

Map heatmap values to a slide grid, using tile location information.

Parameters:

locations (np.ndarray) – Array of shape (n_tiles, 2) containing x, y coordinates for all image tiles. Coordinates represent the center for an associated tile, and must be in a grid.
values (np.ndarray) – Array of shape (n_tiles,) containing heatmap values for each tile.
wsi (slideflow.wsi.WSI) – WSI object.

Keyword Arguments:

background (str, optional) – Background strategy for heatmap. Can be ‘min’, ‘mean’, ‘median’, ‘max’, or ‘mask’. Defaults to ‘min’.
interpolation (str, optional) – Interpolation strategy for smoothing heatmap. Defaults to ‘bicubic’.

md5(path: str) → str[source]¶: Calculate and return MD5 checksum for a file.

multi_warn(arr: List, compare: Callable, msg: Callable | str) → int[source]¶

Logs multiple warning

Parameters:

arr (List) – Array to compare.
compare (Callable) – Comparison to perform on array. If True, will warn.
msg (str) – Warning message.

Returns:

Number of warnings.

Return type:

int

path_input(prompt: str, root: str, default: str | None = None, create_on_invalid: bool = False, filetype: str | None = None, verify: bool = True) → str[source]¶: Prompts user for directory input.

path_to_ext(path: str) → str[source]¶: Returns extension of a file path string.

path_to_name(path: str) → str[source]¶: Returns name of a file, without extension, from a given full path string.

read_annotations(path: str) → Tuple[List[str], List[Dict]][source]¶: Read an annotations file.

relative_path(path: str, root: str)[source]¶: Returns a relative path, from a given root directory.

setLoggingLevel(level)[source]¶

Set the logging level.

Uses standard python logging levels:

50: CRITICAL
40: ERROR
30: WARNING
20: INFO
10: DEBUG
0: NOTSET

Parameters:: level (int) – Logging level numeric value.

set_ignore_sigint()[source]¶: Ignore keyboard interrupts.

split_list(a: List, n: int) → List[List][source]¶: Function to split a list into n components

tfrecord_heatmap(tfrecord: str, slide: str, tile_px: int, tile_um: int | str, tile_dict: Dict[int, float], filename: str, **kwargs) → None[source]¶

Creates a tfrecord-based WSI heatmap using a dictionary of tile values for heatmap display.

Parameters:

tfrecord (str) – Path to tfrecord.
slide (str) – Path to whole-slide image.
tile_dict (dict) – Dictionary mapping tfrecord indices to a tile-level value for display in heatmap format.
tile_px (int) – Tile width in pixels.
tile_um (int or str) – Tile width in microns (int) or magnification (str, e.g. “20x”).
filename (str) – Destination filename for heatmap.

tile_size_label(tile_px: int, tile_um: str | int) → str[source]¶: Return the string label of the given tile size.

to_onehot(val: int, max: int) → ndarray[source]¶

Converts value to one-hot encoding

Parameters:

val (int) – Value to encode
max (int) – Maximum value (length of onehot encoding)

update_results_log(results_log_path: str, model_name: str, results_dict: Dict) → None[source]¶: Dynamically update results_log when recording training metrics.

write_json(data: Any, filename: str) → None[source]¶

Write data to JSON file.

Parameters:

data (Any) – Data to write.
filename (str) – Path to JSON file.

yes_no_input(prompt: str, default: str = 'no') → bool[source]¶: Prompts user for yes/no input.