Data
Loading datasets
novae.load_dataset(pattern=None, tissue=None, species=None, technology=None, custom_filter=None, top_k=None, dry_run=False)
Automatically load slides from the Novae dataset repository.
Selecting slides
The function arguments allow to filter the slides based on the tissue, species, and name pattern. Internally, the function reads this dataset metadata file to select the slides that match the provided filters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
pattern
|
str | None
|
Optional pattern to match the slides names. |
None
|
tissue
|
list[str] | str | None
|
Optional tissue (or tissue list) to filter the slides. E.g., |
None
|
species
|
list[str] | str | None
|
Optional species (or species list) to filter the slides. E.g., |
None
|
technology
|
list[str] | str | None
|
Optional technology (or technology list) to filter the slides. E.g., |
None
|
custom_filter
|
Callable[[DataFrame], Series] | None
|
Custom filter function that takes the metadata DataFrame (see above link) and returns a boolean Series to decide which rows should be kept. |
None
|
top_k
|
int | None
|
Optional number of slides to keep. If |
None
|
dry_run
|
bool
|
If |
False
|
Returns:
Type | Description |
---|---|
list[AnnData]
|
A list of |
Source code in novae/data/_load/_hf.py
novae.toy_dataset(n_panels=3, n_domains=4, n_slides_per_panel=1, xmax=500, n_vars=100, n_drop=20, step=20, panel_shift_lambda=5, slide_shift_lambda=1.5, domain_shift_lambda=2.0, slide_ids_unique=True, compute_spatial_neighbors=False, merge_last_domain_even_slide=False)
Creates a toy dataset, useful for debugging or testing.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
n_panels
|
int
|
Number of panels. Each panel will correspond to one output |
3
|
n_domains
|
int
|
Number of domains. |
4
|
n_slides_per_panel
|
int
|
Number of slides per panel. |
1
|
xmax
|
int
|
Maximum value for the spatial coordinates (the larger, the more cells). |
500
|
n_vars
|
int
|
Maxmium number of genes per panel. |
100
|
n_drop
|
int
|
Number of genes that are randomly removed for each |
20
|
step
|
int
|
Step between cells in their spatial coordinates. |
20
|
panel_shift_lambda
|
float
|
Lambda used in the exponential law for each panel. |
5
|
slide_shift_lambda
|
float
|
Lambda used in the exponential law for each slide. |
1.5
|
domain_shift_lambda
|
float
|
Lambda used in the exponential law for each domain. |
2.0
|
slide_ids_unique
|
bool
|
Whether to ensure that slide ids are unique. |
True
|
compute_spatial_neighbors
|
bool
|
Whether to compute the spatial neighbors graph. We remove some the edges of one node for testing purposes. |
False
|
Returns:
Type | Description |
---|---|
list[AnnData]
|
A list of |
Source code in novae/data/_load/_toy.py
14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 |
|
Preprocessing
novae.quantile_scaling(adata, multiplier=5, quantile=0.2, per_slide=True)
Preprocess fluorescence data from adata.X
using quantiles of expression.
For each column X
, we compute asinh(X / 5*Q(0.2, X))
, and store them back.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
adata
|
AnnData | list[AnnData]
|
An |
required |
multiplier
|
float
|
The multiplier for the quantile. |
5
|
quantile
|
float
|
The quantile to compute. |
0.2
|
per_slide
|
bool
|
Whether to compute the quantile per slide. If |
True
|
Source code in novae/data/preprocess.py
novae.compute_histo_embeddings(sdata, model='conch', table_key='table', patch_overlap_ratio=0.5, image_key=None, device=None, batch_size=32)
Compute histology embeddings for a given model on a grid of overlapping patches.
It will add a new AnnData
object to the SpatialData
object, containing the embeddings of the patches, and
add a column in the cells table with the index of the closest patch.
Installation
This function requires the multimodal
extra. You can install it with pip install novae[multimodal]
. If you use the CONCH model (default), you also need to install the conch
extra with pip install 'novae[multimodal,conch]'
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdata
|
SpatialData
|
A |
required |
model
|
str | Callable
|
The model to use for computing embeddings. See the sopa documentation for more details. |
'conch'
|
table_key
|
str
|
Name of the |
'table'
|
patch_overlap_ratio
|
float
|
Ratio of overlap between patches. |
0.5
|
image_key
|
str | None
|
Name of the histology image. If None, the function will try to find the image key automatically. |
None
|
device
|
str | None
|
Torch device to use for computation. |
None
|
batch_size
|
int
|
Mini-batch size for computation. |
32
|
Source code in novae/data/_embeddings/_histo.py
novae.compute_histo_pca(sdatas, n_components=50, table_key='table')
Run PCA on the histology embeddings associated to each cell (from the closest patch).
The embedding is stored in adata.obsm["histo_embeddings"]
, where adata
is the table of cell expression.
Info
You need to run novae.data.compute_histo_embeddings before running this function.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdatas
|
Union[SpatialData, list[SpatialData]]
|
One or several |
required |
n_components
|
int
|
Number of components for the PCA. |
50
|
table_key
|
str
|
Name of the |
'table'
|