47%
Keep-Alive and Memory Control
When a model finishes responding, Ollama doesn't unload it. It keeps the weights in memory for a while in case another request comes in. This is the keep-alive timer, and it's why the model you "stopped" using is still showing up in ollama ps minutes later.
The default is 5 minutes of inactivity. Each request resets the clock. After the timer expires, the server unloads the model and frees the memory.
You can see the countdown in the UNTIL
Local AI Engineering with Ollama
Run, understand, customize, fine-tune, and build agentic apps on your own hardwareEnroll now to unlock all content and receive all future updates for free.
