How to Submit Single-Cell Data

Submitting well-annotated single-cell datasets ensures full compliance with FAIR principles — Findable, Accessible, Interoperable, and Reusable. Adding standardized metadata enhances reproducibility, enables integration into reference atlases (e.g. HCA, MCA, FCA, ...), and facilitates use in analysis platforms such as CELLxGENE. Moreover, the CELLxGENE data portal enforces a versioned schema (currently version 6.0.0) to ensure consistency across datasets.

What file format should I use?

Nowadays, standard file formats include Seurat objects (.rds), HDF5 files such as LOOM (.loom) or AnnData (.h5ad) and others.

At scFAIR, we recommend using the .h5ad format, following the default paths and metadata described in CELLxGENE schema 6.0.0 and summarized below. Please include at least the raw count matrix (non filtered) in your file (whether in /X or in /raw/X). Please also include the normalized counts if available. If the data is integrated, it's also good practice to keep the integrated embedding within the file.

Required Metadata Fields (AnnData .h5ad)

Use the CELLxGENE schema 6.0.0 for full metadata specs:

📁 Dataset-level (/uns)

  • title and description
  • contact (name + email)
  • Publication/preprint DOI
  • URLs (e.g. GEO accession, protocol links)
  • default_embedding, comments

📁 Cell-level metadata (obs)

Use controlled vocabularies (NCBITaxon, UBERON, CL, MONDO, etc.) as required by the schema.

Refer to the full cellxgene schema 6.0.0 for complete metadata definitions and ontologies

In short, it will ask you to include for each cell:

organism_ontology_term_id

Organism. The CELLxGENE schema for now only supports a restricted list of organisms, but feel free to use any taxon of any organism if you submit to another portal.

tissue_ontology_term_id

Tissue

cell_type_ontology_term_id

Cell-types (annotated / curated). Since it's an ontology, it's better to provide the most detailed cell-type, since we can recover the broader annotation by navigating the ontology graph.

development_stage_ontology_term_id

Development stage. Embryonic or Adult. If adult, please provide the age in days or years (depending on the species)

sex_ontology_term_id

Male, Female, Mixed sex (if mixed and unknown at cell level), Hermaphrodite or Unknown.

self_reported_ethnicity_ontology_term_id

For homo sapiens species.

disease_ontology_term_id

Use PATO:0000461 for normal or healthy. Or find your condition in MONDO ontology.

[Optional] Additional metadata

batch, donor_id, sample_id, clustering, doublets, etc... Any other metadata that could be useful for interoperability and reproducibility (usually, the more the better)

At least one 2D embedding stored in obsm (e.g. X_umap, X_pca)

Submission & Validation

You can validate your .h5ad with this validation tool.

Next, you can submit validated files to any supported portal such as CELLxGENE, ASAP, or SingleCellPortal. You can also upload your .h5ad file to GEO or ArrayExpress, along with your sequencing data.