scRNA-seq Troubleshooting Guide 2025

Expert solutions for common single cell RNA sequencing analysis and cell type annotation problems

🚨 Quick Problem Identifier

Select your issue type to jump to the relevant solution:

📊 Data Quality Issues

Low cell counts, high mitochondrial genes, doublets

⚙️ Preprocessing Problems

Normalization, scaling, feature selection issues

🔗 Clustering Issues

Poor clusters, over/under-clustering, resolution problems

🏷️ Annotation Problems

Incorrect cell types, low confidence, novel populations

🔄 Batch Integration

Batch effects, integration failures, sample differences

⚡ Performance Issues

Slow processing, memory errors, crashes

📊 Data Quality Troubleshooting

Problem: Low Cell/Gene Counts

Symptoms:

  • Many cells have < 1000 detected genes
  • High percentage of cells removed during filtering
  • Poor clustering results

Solutions:

  1. Adjust Quality Thresholds:
    # R/Seurat example
    # Lower thresholds for difficult samples
    subset(seurat_obj, subset = nFeature_RNA > 500 & nCount_RNA > 1000)
    
    # Python/Scanpy example  
    sc.pp.filter_cells(adata, min_genes=500)
    sc.pp.filter_genes(adata, min_cells=10)
  2. Check Sample Preparation: Review tissue dissociation, cell capture efficiency
  3. Consider Cell Type: Some cell types naturally have lower RNA content

Problem: High Mitochondrial Gene Expression

Symptoms:

  • Cells with >20% mitochondrial gene expression
  • Potential cell stress or death

Solutions:

  1. Progressive Filtering:
    # Start with relaxed thresholds
    subset(seurat_obj, subset = percent.mt < 25)
    # Then tighten based on results
    subset(seurat_obj, subset = percent.mt < 15)
  2. Tissue-Specific Thresholds: Brain cells: <20%, Blood cells: <10%
  3. Regression Approach: Regress out mitochondrial effects instead of filtering

⚙️ Preprocessing Troubleshooting

Problem: Poor Normalization Results

Symptoms:

  • Batch effects still visible after normalization
  • Highly variable genes dominated by ribosomal/mitochondrial genes
  • Poor clustering separation

Solutions:

  1. Try Different Normalization Methods:
    MethodBest ForWhen to Use
    Log-normalizationStandard analysisDefault choice
    SCTransformHeterogeneous datasetsStrong batch effects
    scranSparse dataMany zero counts
  2. Parameter Optimization: Adjust scaling factors and regression variables
  3. Alternative Approaches: Consider SCTransform for complex datasets

Problem: Feature Selection Issues

Solutions:

  1. Increase HVG Count: Try 3000-5000 highly variable genes instead of 2000
  2. Use Multiple Methods: Combine vst, mean.var.plot, and dispersion approaches
  3. Manual Curation: Remove unwanted gene categories (ribosomal, mitochondrial)

🔗 Clustering Troubleshooting

Problem: Over-clustering (Too Many Small Clusters)

Solutions:

  1. Reduce Resolution:
    # Try lower resolution values
    FindClusters(seurat_obj, resolution = 0.3)  # Instead of 0.8
    FindClusters(seurat_obj, resolution = 0.5)  # Middle ground
  2. Increase k Parameter: Use more neighbors in SNN graph construction
  3. Merge Similar Clusters: Use hierarchical clustering to identify merge candidates

Problem: Under-clustering (Missing Cell Types)

Solutions:

  1. Increase Resolution: Try 0.8, 1.0, or higher
  2. Adjust PCA Dimensions: Use more PCs (30-50 instead of 20)
  3. Re-examine Preprocessing: Check if important genes were filtered out

🏷️ Cell Type Annotation Troubleshooting

Problem: Incorrect Cell Type Assignments

Symptoms:

  • Known markers not matching assigned cell types
  • Biologically implausible results
  • Low confidence scores

Solutions with mLLMCelltype:

  1. Enable Multi-Model Consensus:

    🎯 Pro Tip: Use 3-5 different AI models and set consensus threshold to 0.7 for higher confidence

  2. Use Discussion Mode: Let models debate uncertain annotations
  3. Provide Better Context: Include tissue type and experimental condition information
  4. Refine Marker Genes: Use more specific, high-quality marker genes

Problem: Novel or Rare Cell Types Not Recognized

Solutions:

  1. Lower Consensus Threshold: Allow more exploratory annotations
  2. Manual Review: Examine clusters with "Unknown" annotations
  3. Literature Search: Research potential novel populations in your tissue
  4. Functional Analysis: Perform pathway analysis to understand cell function

🔄 Batch Integration Troubleshooting

Problem: Strong Batch Effects Persist

Integration Method Comparison:

MethodStrengthBest For
HarmonyFast, robustLarge datasets
scanoramaPanoramic integrationDiverse samples
CCA/RPCASeurat nativeSimilar protocols
scVIDeep learningComplex batch effects

⚡ Performance Troubleshooting

Problem: Memory Errors and Crashes

Solutions:

  1. Reduce Data Size: Subsample cells or genes for initial analysis
  2. Use Disk-Based Storage: Enable on-disk storage for large objects
  3. Optimize Parameters: Reduce PCA dimensions, use fewer HVGs
  4. Cloud Computing: Use mLLMCelltype web platform for processing

🛡️ Prevention Best Practices

📋 Quality Control Checklist

  • Always plot QC metrics before filtering
  • Use tissue-appropriate thresholds
  • Document all parameter choices
  • Save intermediate analysis steps

🔍 Validation Steps

  • Cross-check with known markers
  • Validate with external datasets
  • Use multiple annotation methods
  • Manual review of uncertain clusters

📚 Documentation

  • Record all software versions
  • Document parameter settings
  • Keep analysis notebooks organized
  • Note troubleshooting steps taken

Still Having Issues?

Try mLLMCelltype's AI-powered annotation for more accurate results

Try mLLMCelltype Now View FAQ