📊 Data Quality Troubleshooting
Problem: Low Cell/Gene Counts
Symptoms:
- Many cells have < 1000 detected genes
- High percentage of cells removed during filtering
- Poor clustering results
Solutions:
- Adjust Quality Thresholds:
# R/Seurat example # Lower thresholds for difficult samples subset(seurat_obj, subset = nFeature_RNA > 500 & nCount_RNA > 1000) # Python/Scanpy example sc.pp.filter_cells(adata, min_genes=500) sc.pp.filter_genes(adata, min_cells=10)
- Check Sample Preparation: Review tissue dissociation, cell capture efficiency
- Consider Cell Type: Some cell types naturally have lower RNA content
Problem: High Mitochondrial Gene Expression
Symptoms:
- Cells with >20% mitochondrial gene expression
- Potential cell stress or death
Solutions:
- Progressive Filtering:
# Start with relaxed thresholds subset(seurat_obj, subset = percent.mt < 25) # Then tighten based on results subset(seurat_obj, subset = percent.mt < 15)
- Tissue-Specific Thresholds: Brain cells: <20%, Blood cells: <10%
- Regression Approach: Regress out mitochondrial effects instead of filtering
⚙️ Preprocessing Troubleshooting
Problem: Poor Normalization Results
Symptoms:
- Batch effects still visible after normalization
- Highly variable genes dominated by ribosomal/mitochondrial genes
- Poor clustering separation
Solutions:
- Try Different Normalization Methods:
Method Best For When to Use Log-normalization Standard analysis Default choice SCTransform Heterogeneous datasets Strong batch effects scran Sparse data Many zero counts - Parameter Optimization: Adjust scaling factors and regression variables
- Alternative Approaches: Consider SCTransform for complex datasets
Problem: Feature Selection Issues
Solutions:
- Increase HVG Count: Try 3000-5000 highly variable genes instead of 2000
- Use Multiple Methods: Combine vst, mean.var.plot, and dispersion approaches
- Manual Curation: Remove unwanted gene categories (ribosomal, mitochondrial)
🔗 Clustering Troubleshooting
Problem: Over-clustering (Too Many Small Clusters)
Solutions:
- Reduce Resolution:
# Try lower resolution values FindClusters(seurat_obj, resolution = 0.3) # Instead of 0.8 FindClusters(seurat_obj, resolution = 0.5) # Middle ground
- Increase k Parameter: Use more neighbors in SNN graph construction
- Merge Similar Clusters: Use hierarchical clustering to identify merge candidates
Problem: Under-clustering (Missing Cell Types)
Solutions:
- Increase Resolution: Try 0.8, 1.0, or higher
- Adjust PCA Dimensions: Use more PCs (30-50 instead of 20)
- Re-examine Preprocessing: Check if important genes were filtered out
🏷️ Cell Type Annotation Troubleshooting
Problem: Incorrect Cell Type Assignments
Symptoms:
- Known markers not matching assigned cell types
- Biologically implausible results
- Low confidence scores
Solutions with mLLMCelltype:
- Enable Multi-Model Consensus:
🎯 Pro Tip: Use 3-5 different AI models and set consensus threshold to 0.7 for higher confidence
- Use Discussion Mode: Let models debate uncertain annotations
- Provide Better Context: Include tissue type and experimental condition information
- Refine Marker Genes: Use more specific, high-quality marker genes
Problem: Novel or Rare Cell Types Not Recognized
Solutions:
- Lower Consensus Threshold: Allow more exploratory annotations
- Manual Review: Examine clusters with "Unknown" annotations
- Literature Search: Research potential novel populations in your tissue
- Functional Analysis: Perform pathway analysis to understand cell function
🔄 Batch Integration Troubleshooting
Problem: Strong Batch Effects Persist
Integration Method Comparison:
Method | Strength | Best For |
---|---|---|
Harmony | Fast, robust | Large datasets |
scanorama | Panoramic integration | Diverse samples |
CCA/RPCA | Seurat native | Similar protocols |
scVI | Deep learning | Complex batch effects |
⚡ Performance Troubleshooting
Problem: Memory Errors and Crashes
Solutions:
- Reduce Data Size: Subsample cells or genes for initial analysis
- Use Disk-Based Storage: Enable on-disk storage for large objects
- Optimize Parameters: Reduce PCA dimensions, use fewer HVGs
- Cloud Computing: Use mLLMCelltype web platform for processing
🛡️ Prevention Best Practices
📋 Quality Control Checklist
- Always plot QC metrics before filtering
- Use tissue-appropriate thresholds
- Document all parameter choices
- Save intermediate analysis steps
🔍 Validation Steps
- Cross-check with known markers
- Validate with external datasets
- Use multiple annotation methods
- Manual review of uncertain clusters
📚 Documentation
- Record all software versions
- Document parameter settings
- Keep analysis notebooks organized
- Note troubleshooting steps taken