Feedback

Chat Icon

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

52%

Concurrency: Parallel Requests and the Queue

OLLAMA_KEEP_ALIVE controls how long a model stays loaded. 2 other variables control how many requests the loaded model handles at once and what happens when too many arrive: OLLAMA_NUM_PARALLEL

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.