Single-Cell and Spatial Omics Analysis
Single-cell and spatial omics let you answer two core biological questions at once: which cell states exist, and where they are located in tissue.
Learning Goals
By the end of this chapter, you should be able to:
- Explain core concepts in single-cell and spatial omics workflows.
- Run a basic scRNA-seq analysis pipeline from quality control to annotation.
- Understand batch correction and integration strategies across datasets.
- Perform differential expression and pathway interpretation by cell type.
- Connect spatial context to cell-state biology for stronger interpretation.
Why This Path Matters
Bulk omics averages signals across many cell types. Single-cell and spatial omics preserve heterogeneity and tissue architecture, which is critical for:
- Tumor microenvironment profiling.
- Immune response mapping.
- Developmental trajectory studies.
- Precision biomarker discovery.
Core Data Types
scRNA-seq
- Measures transcript abundance per individual cell.
- Typical output: gene-by-cell count matrix + metadata.
Spatial Transcriptomics
- Measures expression in physical tissue coordinates.
- Typical output: expression matrix + spot coordinates + histology image.
Optional companion modalities
- CITE-seq (RNA + surface proteins).
- scATAC-seq (chromatin accessibility).
- Multiome (paired RNA + ATAC).
Recommended Analysis Workflow
1) Experimental Design and Metadata
Capture this before analysis:
- Sample groups and contrasts.
- Batch sources (patient, run date, chemistry, site).
- Tissue region, disease stage, treatment metadata.
- Expected cell populations and known markers.
Bad metadata design causes major downstream interpretation errors.
2) Quality Control (QC)
Typical QC metrics:
- Number of detected genes per cell (
nFeature_RNA). - Total counts per cell (
nCount_RNA). - Mitochondrial percentage (
percent.mt). - Doublet probability.
Example (Seurat, R)
library(Seurat)
obj <- CreateSeuratObject(counts = counts_matrix, project = "sc_project")
obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-")
# Example thresholds; tune per dataset
obj <- subset(
obj,
subset = nFeature_RNA > 300 &
nFeature_RNA < 7000 &
percent.mt < 15
)
3) Normalization and Feature Selection
Common practice:
- Library-size normalization (
LogNormalize) orSCTransform. - Identify highly variable genes.
- Scale data before dimensionality reduction.
obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj, selection.method = "vst", nfeatures = 3000)
obj <- ScaleData(obj)
4) Dimensionality Reduction and Clustering
- PCA for compact representation.
- Graph construction.
- Cluster assignment.
- UMAP/t-SNE visualization.
obj <- RunPCA(obj, npcs = 50)
obj <- FindNeighbors(obj, dims = 1:30)
obj <- FindClusters(obj, resolution = 0.5)
obj <- RunUMAP(obj, dims = 1:30)
DimPlot(obj, reduction = "umap", label = TRUE)
5) Cell Type Annotation
Use three complementary strategies:
- Canonical marker genes (manual curation).
- Reference mapping (Azimuth/SingleR/celltypist).
- Marker-driven confidence scoring.
markers <- FindAllMarkers(obj, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
head(markers)
6) Batch Correction and Multi-Sample Integration
When integrating cohorts or sites, compare methods:
- Seurat integration anchors.
- Harmony.
- Scanorama / BBKNN (Python ecosystem).
Goal: remove technical effects while preserving biology.
# Harmony example sketch
library(harmony)
obj <- RunHarmony(obj, group.by.vars = "batch")
obj <- RunUMAP(obj, reduction = "harmony", dims = 1:30)
7) Differential Expression and Pathway Analysis
Run contrasts at the right level:
- Cell type-specific DE (preferred).
- Pseudobulk DE by sample and cell type for robust inference.
- Enrichment with GO/KEGG/Reactome.
Idents(obj) <- "celltype"
de_tcells <- FindMarkers(obj, ident.1 = "T_cell_treated", ident.2 = "T_cell_control")
head(de_tcells)
8) Spatial Transcriptomics Integration
Use spatial data to validate where cell states localize.
Key analyses:
- Spatial clustering.
- Spot deconvolution using scRNA-seq references.
- Region-specific pathway signatures.
Example sketch (Seurat spatial)
spatial <- Load10X_Spatial(data.dir = "spatial_sample/")
spatial <- SCTransform(spatial, assay = "Spatial", verbose = FALSE)
spatial <- RunPCA(spatial)
spatial <- RunUMAP(spatial, dims = 1:30)
SpatialFeaturePlot(spatial, features = c("EPCAM", "COL1A1"))
Best Practices for Reproducibility
- Keep a sample manifest and fixed metadata schema.
- Save intermediate objects (
.rdsor.h5ad) per major step. - Track package versions and parameters.
- Use consistent QC thresholds with documented rationale.
- Separate exploratory from confirmatory analyses.
Common Pitfalls
- Over-filtering rare but biologically important cells.
- Calling clusters as cell types without marker validation.
- Treating batch correction outputs as absolute truth.
- Mixing donor/sample effects with true biological effects.
- Ignoring spatial resolution limitations during interpretation.
Suggested Practice Datasets
- PBMC 3k (Seurat tutorial baseline).
- 10x Visium public datasets for spatial workflows.
- Multi-donor immune datasets for batch/integration practice.
Summary
Single-cell plus spatial omics provides a high-resolution framework to discover cell states, interactions, and tissue context. Strong analysis depends on disciplined QC, robust integration, biologically grounded annotation, and reproducible reporting.
Give Feedback
Use the feedback form to share what worked, what was unclear, and what should be improved.
Open Feedback Form