52%
Concurrency: Parallel Requests and the Queue
OLLAMA_KEEP_ALIVE controls how long a model stays loaded. 2 other variables control how many requests the loaded model handles at once and what happens when too many arrive: OLLAMA_NUM_PARALLEL
Local AI Engineering with Ollama
Run, understand, customize, fine-tune, and build agentic apps on your own hardwareEnroll now to unlock all content and receive all future updates for free.
