Frequently Asked Questions

Everything you need to know about AI-powered cell type annotation

🧬 Basic Concepts

What is cell type annotation and why is it important?

Cell type annotation is the process of identifying and labeling different cell types in single-cell RNA sequencing (scRNA-seq) data based on their gene expression patterns.

Why it matters:

  • Disease Research: Understanding which cell types are affected in diseases
  • Drug Development: Identifying target cell populations for treatments
  • Developmental Biology: Tracking cell fate decisions during development
  • Tissue Function: Understanding cellular composition and interactions

How does AI-powered cell type annotation work?

AI-powered annotation leverages large language models (LLMs) trained on vast biomedical literature:

  1. Data Input: Upload your marker gene expression data
  2. AI Analysis: Models analyze gene signatures against biomedical knowledge
  3. Pattern Recognition: AI identifies characteristic expression patterns
  4. Consensus Building: Multiple models vote on cell type predictions
  5. Confidence Scoring: Each prediction receives accuracy estimates

🔬 Technical Usage

What file formats are supported for scRNA-seq data upload?

Supported formats:

  • CSV files (.csv) - Comma-separated values
  • TSV files (.tsv) - Tab-separated values
  • Excel files (.xlsx) - Microsoft Excel format

Data structure requirements:

  • Rows: Marker genes (gene symbols)
  • Columns: Cell clusters or cell types
  • Values: Expression levels, fold changes, or binary presence/absence

Which AI models can I use for cell type annotation?

Available AI Models:

OpenAI: GPT-4, GPT-4o, GPT-4o-mini
Anthropic: Claude 3.5 Sonnet, Claude 3.5 Haiku
Google: Gemini 1.5 Pro, Gemini 1.5 Flash
DeepSeek: DeepSeek V3
Chinese Models: Qwen, GLM-4, MiniMax, StepFun
OpenRouter: Access to additional models

📊 Accuracy & Performance

How accurate is multi-model consensus annotation?

Accuracy Benchmarks:

  • Single Model: 75-85% accuracy
  • Multi-Model Consensus: 85-95% accuracy
  • High-Confidence Predictions: >95% accuracy

Factors affecting accuracy:

  • Tissue type complexity
  • Marker gene quality and specificity
  • Number of models in consensus
  • Data preprocessing quality

🔐 Security & Privacy

Is my data secure when using the web platform?

Security Measures:

  • Encryption: All data transmission uses HTTPS/TLS encryption
  • Temporary Storage: Files processed temporarily and auto-deleted
  • No Permanent Storage: We don't keep your research data
  • Secure APIs: All AI model APIs use secure connections
  • User Control: You decide when to download and delete results

🔧 Troubleshooting

What should I do if annotation results seem incorrect?

Troubleshooting Steps:

  1. Check Data Quality: Ensure marker genes are specific and well-defined
  2. Adjust Parameters: Lower consensus threshold or increase discussion rounds
  3. Add More Models: Use additional AI models for better consensus
  4. Enable Discussion Mode: Let models discuss and refine predictions
  5. Validate Markers: Cross-check with literature or databases like CellMarker
  6. Consider Tissue Context: Some cell types are tissue-specific

🔗 Integration & API

Can I integrate mLLMCelltype with my existing analysis pipeline?

Integration Options:

  • Python Package: Install via pip for direct integration
  • Web API: RESTful endpoints for programmatic access
  • Scanpy Integration: Native support for Scanpy workflows
  • Seurat Compatibility: Export results for R/Seurat analysis
  • Jupyter Notebooks: Interactive analysis examples
# Python package installation
pip install mllmcelltype

# Basic usage
from mllmcelltype import annotate_cells
results = annotate_cells(marker_data, models=['gpt-4', 'claude-3.5'])

Ready to start annotating?

Try our AI-powered cell type annotation platform now

Start Annotation Browse Resources