Feedback

Chat Icon

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Concurrency: Parallel Requests and the Queue
54%

How Many Waiting Requests Are Tolerated

When all parallel slots are busy, additional requests queue. OLLAMA_MAX_QUEUE sets the maximum queue length, defaulting to 512.

# When debugging:
# OLLAMA_MAX_QUEUE=1000 ollama serve

# When using systemd, add it to the unit file:
cat > /etc/systemd/system/ollama.service.d/override.conf <<'EOF'
[Service]
Environment="OLLAMA_MAX_QUEUE=1000"

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.