📊 Data Quality Troubleshooting

Problem: Low Cell/Gene Counts

Symptoms:

Many cells have < 1000 detected genes
High percentage of cells removed during filtering
Poor clustering results

Solutions:

Adjust Quality Thresholds:

# R/Seurat example
# Lower thresholds for difficult samples
subset(seurat_obj, subset = nFeature_RNA > 500 & nCount_RNA > 1000)

# Python/Scanpy example  
sc.pp.filter_cells(adata, min_genes=500)
sc.pp.filter_genes(adata, min_cells=10)

Check Sample Preparation: Review tissue dissociation, cell capture efficiency
Consider Cell Type: Some cell types naturally have lower RNA content

Problem: High Mitochondrial Gene Expression

Symptoms:

Cells with >20% mitochondrial gene expression
Potential cell stress or death

Solutions:

Progressive Filtering:

# Start with relaxed thresholds
subset(seurat_obj, subset = percent.mt < 25)
# Then tighten based on results
subset(seurat_obj, subset = percent.mt < 15)

Tissue-Specific Thresholds: Brain cells: <20%, Blood cells: <10%
Regression Approach: Regress out mitochondrial effects instead of filtering

⚙️ Preprocessing Troubleshooting

Problem: Poor Normalization Results

Symptoms:

Batch effects still visible after normalization
Highly variable genes dominated by ribosomal/mitochondrial genes
Poor clustering separation

Solutions:

Try Different Normalization Methods:

Method	Best For	When to Use
Log-normalization	Standard analysis	Default choice
SCTransform	Heterogeneous datasets	Strong batch effects
scran	Sparse data	Many zero counts

Parameter Optimization: Adjust scaling factors and regression variables
Alternative Approaches: Consider SCTransform for complex datasets

Problem: Feature Selection Issues

Solutions:

Increase HVG Count: Try 3000-5000 highly variable genes instead of 2000
Use Multiple Methods: Combine vst, mean.var.plot, and dispersion approaches
Manual Curation: Remove unwanted gene categories (ribosomal, mitochondrial)

🔗 Clustering Troubleshooting

Problem: Over-clustering (Too Many Small Clusters)

Solutions:

Reduce Resolution:

# Try lower resolution values
FindClusters(seurat_obj, resolution = 0.3)  # Instead of 0.8
FindClusters(seurat_obj, resolution = 0.5)  # Middle ground

Increase k Parameter: Use more neighbors in SNN graph construction
Merge Similar Clusters: Use hierarchical clustering to identify merge candidates

Problem: Under-clustering (Missing Cell Types)

Solutions:

Increase Resolution: Try 0.8, 1.0, or higher
Adjust PCA Dimensions: Use more PCs (30-50 instead of 20)
Re-examine Preprocessing: Check if important genes were filtered out

🏷️ Cell Type Annotation Troubleshooting

Problem: Incorrect Cell Type Assignments

Symptoms:

Known markers not matching assigned cell types
Biologically implausible results
Low confidence scores

Solutions with mLLMCelltype:

Enable Multi-Model Consensus:

🎯 Pro Tip: Use 3-5 different AI models and set consensus threshold to 0.7 for higher confidence
Use Discussion Mode: Let models debate uncertain annotations
Provide Better Context: Include tissue type and experimental condition information
Refine Marker Genes: Use more specific, high-quality marker genes

Problem: Novel or Rare Cell Types Not Recognized

Solutions:

Lower Consensus Threshold: Allow more exploratory annotations
Manual Review: Examine clusters with "Unknown" annotations
Literature Search: Research potential novel populations in your tissue
Functional Analysis: Perform pathway analysis to understand cell function

🔄 Batch Integration Troubleshooting

Problem: Strong Batch Effects Persist

Integration Method Comparison:

Method	Strength	Best For
Harmony	Fast, robust	Large datasets
scanorama	Panoramic integration	Diverse samples
CCA/RPCA	Seurat native	Similar protocols
scVI	Deep learning	Complex batch effects

⚡ Performance Troubleshooting

Problem: Memory Errors and Crashes

Solutions:

Reduce Data Size: Subsample cells or genes for initial analysis
Use Disk-Based Storage: Enable on-disk storage for large objects
Optimize Parameters: Reduce PCA dimensions, use fewer HVGs
Cloud Computing: Use mLLMCelltype web platform for processing

🛡️ Prevention Best Practices

📋 Quality Control Checklist

Always plot QC metrics before filtering
Use tissue-appropriate thresholds
Document all parameter choices
Save intermediate analysis steps

🔍 Validation Steps

Cross-check with known markers
Validate with external datasets
Use multiple annotation methods
Manual review of uncertain clusters

📚 Documentation

Record all software versions
Document parameter settings
Keep analysis notebooks organized
Note troubleshooting steps taken

scRNA-seq Troubleshooting Guide 2025

🚨 Quick Problem Identifier

📊 Data Quality Issues

⚙️ Preprocessing Problems

🔗 Clustering Issues

🏷️ Annotation Problems

🔄 Batch Integration

⚡ Performance Issues

📊 Data Quality Troubleshooting

Problem: Low Cell/Gene Counts

Symptoms:

Solutions:

Problem: High Mitochondrial Gene Expression

Symptoms:

Solutions:

⚙️ Preprocessing Troubleshooting

Problem: Poor Normalization Results

Symptoms:

Solutions:

Problem: Feature Selection Issues

Solutions:

🔗 Clustering Troubleshooting

Problem: Over-clustering (Too Many Small Clusters)

Solutions:

Problem: Under-clustering (Missing Cell Types)

Solutions:

🏷️ Cell Type Annotation Troubleshooting

Problem: Incorrect Cell Type Assignments

Symptoms:

Solutions with mLLMCelltype:

Problem: Novel or Rare Cell Types Not Recognized

Solutions:

🔄 Batch Integration Troubleshooting

Problem: Strong Batch Effects Persist

Integration Method Comparison:

⚡ Performance Troubleshooting

Problem: Memory Errors and Crashes

Solutions:

🛡️ Prevention Best Practices

📋 Quality Control Checklist

🔍 Validation Steps

📚 Documentation

🔗 Related Resources

Tool Comparison Guide

Frequently Asked Questions

Research Success Stories

Advanced Tutorials

Still Having Issues?