Single-Cell and Spatial Omics Analysis

Single-Cell and Spatial Omics Analysis banner

Single-cell and spatial omics let you answer two core biological questions at once: which cell states exist, and where they are located in tissue.

Learning Goals

By the end of this chapter, you should be able to:

Explain core concepts in single-cell and spatial omics workflows.
Run a basic scRNA-seq analysis pipeline from quality control to annotation.
Understand batch correction and integration strategies across datasets.
Perform differential expression and pathway interpretation by cell type.
Connect spatial context to cell-state biology for stronger interpretation.

Why This Path Matters

Bulk omics averages signals across many cell types. Single-cell and spatial omics preserve heterogeneity and tissue architecture, which is critical for:

Tumor microenvironment profiling.
Immune response mapping.
Developmental trajectory studies.
Precision biomarker discovery.

Core Data Types

scRNA-seq

Measures transcript abundance per individual cell.
Typical output: gene-by-cell count matrix + metadata.

Spatial Transcriptomics

Measures expression in physical tissue coordinates.
Typical output: expression matrix + spot coordinates + histology image.

Optional companion modalities

CITE-seq (RNA + surface proteins).
scATAC-seq (chromatin accessibility).
Multiome (paired RNA + ATAC).

Recommended Analysis Workflow

1) Experimental Design and Metadata

Capture this before analysis:

Sample groups and contrasts.
Batch sources (patient, run date, chemistry, site).
Tissue region, disease stage, treatment metadata.
Expected cell populations and known markers.

Bad metadata design causes major downstream interpretation errors.

2) Quality Control (QC)

Typical QC metrics:

Number of detected genes per cell (nFeature_RNA).
Total counts per cell (nCount_RNA).
Mitochondrial percentage (percent.mt).
Doublet probability.

Example (Seurat, R)

library(Seurat)

obj <- CreateSeuratObject(counts = counts_matrix, project = "sc_project")
obj[["percent.mt"]] <- PercentageFeatureSet(obj, pattern = "^MT-")

# Example thresholds; tune per dataset
obj <- subset(
  obj,
  subset = nFeature_RNA > 300 &
           nFeature_RNA < 7000 &
           percent.mt < 15
)

3) Normalization and Feature Selection

Common practice:

Library-size normalization (LogNormalize) or SCTransform.
Identify highly variable genes.
Scale data before dimensionality reduction.

obj <- NormalizeData(obj)
obj <- FindVariableFeatures(obj, selection.method = "vst", nfeatures = 3000)
obj <- ScaleData(obj)

4) Dimensionality Reduction and Clustering

PCA for compact representation.
Graph construction.
Cluster assignment.
UMAP/t-SNE visualization.

obj <- RunPCA(obj, npcs = 50)
obj <- FindNeighbors(obj, dims = 1:30)
obj <- FindClusters(obj, resolution = 0.5)
obj <- RunUMAP(obj, dims = 1:30)
DimPlot(obj, reduction = "umap", label = TRUE)

5) Cell Type Annotation

Use three complementary strategies:

Canonical marker genes (manual curation).
Reference mapping (Azimuth/SingleR/celltypist).
Marker-driven confidence scoring.

markers <- FindAllMarkers(obj, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
head(markers)

6) Batch Correction and Multi-Sample Integration

When integrating cohorts or sites, compare methods:

Seurat integration anchors.
Harmony.
Scanorama / BBKNN (Python ecosystem).

Goal: remove technical effects while preserving biology.

# Harmony example sketch
library(harmony)
obj <- RunHarmony(obj, group.by.vars = "batch")
obj <- RunUMAP(obj, reduction = "harmony", dims = 1:30)

7) Differential Expression and Pathway Analysis

Run contrasts at the right level:

Cell type-specific DE (preferred).
Pseudobulk DE by sample and cell type for robust inference.
Enrichment with GO/KEGG/Reactome.

Idents(obj) <- "celltype"
de_tcells <- FindMarkers(obj, ident.1 = "T_cell_treated", ident.2 = "T_cell_control")
head(de_tcells)

8) Spatial Transcriptomics Integration

Use spatial data to validate where cell states localize.

Key analyses:

Spatial clustering.
Spot deconvolution using scRNA-seq references.
Region-specific pathway signatures.

Example sketch (Seurat spatial)

spatial <- Load10X_Spatial(data.dir = "spatial_sample/")
spatial <- SCTransform(spatial, assay = "Spatial", verbose = FALSE)
spatial <- RunPCA(spatial)
spatial <- RunUMAP(spatial, dims = 1:30)
SpatialFeaturePlot(spatial, features = c("EPCAM", "COL1A1"))

Best Practices for Reproducibility

Keep a sample manifest and fixed metadata schema.
Save intermediate objects (.rds or .h5ad) per major step.
Track package versions and parameters.
Use consistent QC thresholds with documented rationale.
Separate exploratory from confirmatory analyses.

Common Pitfalls

Over-filtering rare but biologically important cells.
Calling clusters as cell types without marker validation.
Treating batch correction outputs as absolute truth.
Mixing donor/sample effects with true biological effects.
Ignoring spatial resolution limitations during interpretation.

Suggested Practice Datasets

PBMC 3k (Seurat tutorial baseline).
10x Visium public datasets for spatial workflows.
Multi-donor immune datasets for batch/integration practice.

Summary

Single-cell plus spatial omics provides a high-resolution framework to discover cell states, interactions, and tissue context. Strong analysis depends on disciplined QC, robust integration, biologically grounded annotation, and reproducible reporting.

Give Feedback

Use the feedback form to share what worked, what was unclear, and what should be improved.

Open Feedback Form