47%

Keep-Alive and Memory Control

When a model finishes responding, Ollama doesn't unload it. It keeps the weights in memory for a while in case another request comes in. This is the keep-alive timer, and it's why the model you "stopped" using is still showing up in ollama ps minutes later.

The default is 5 minutes of inactivity. Each request resets the clock. After the timer expires, the server unloads the model and frees the memory.

You can see the countdown in the UNTIL

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.

Unlock now $26.99 Learn More

Previous Next