Getting Started

Installation

Scyan can be installed on every OS with pip or poetry.

On macOS / Linux, python>=3.8,<3.11 is required, while python>=3.8,<3.10 is required on Windows. The preferred Python version is 3.9.

Advice (optional)

We advise creating a new environment via a package manager (except if you use Poetry, which will automatically create the environment).

For instance, you can create a new conda environment:

conda create --name scyan python=3.9
conda activate scyan

Choose one of the following, depending on your needs (it should take at most a few minutes):

From PyPILocal install (pip)Local install (pip, dev mode)Poetry (dev mode)

pip install scyan

git clone https://github.com/MICS-Lab/scyan.git
cd scyan

pip install .

git clone https://github.com/MICS-Lab/scyan.git
cd scyan

pip install -e '.[dev,hydra,discovery]'

git clone https://github.com/MICS-Lab/scyan.git
cd scyan

poetry install -E 'dev hydra discovery'

import scyan

adata, table = scyan.data.load("aml") # Automatic loading

model = scyan.Scyan(adata, table)
model.fit()
model.predict()

This code should run in approximately 40 seconds (once the dataset is loaded).

adata is an AnnData object, whose variables (adata.var) corresponds to markers, and observations (adata.obs) to cells. adata.X is a matrix of size (\(N\) cells, \(M\) markers) representing cell-marker expressions after being preprocessed (asinh or logicle) and standardized.
table is a pandas DataFrame with \(P\) rows (one per population) and \(M\) columns (one per marker). Each value represents the knowledge about the expected expression, i.e. -1 for negative expression, 1 for positive expression, or NA if we don't know. It can also be any float value such as 0 or -0.5 for mid and low expressions, respectively (use it only when necessary).

Help to create the adata object and the table

Read the preprocessing tutorial if you have an FCS file and want explanations to initialize Scyan. You can also look at existing tables.

Check

Make sure every marker from the table (i.e. columns names of the DataFrame) is inside the data, i.e. in adata.var_names.