Picking and Pulling Models
25%
Understanding What You Can Run on Your System
As we saw earlier, some models need more RAM than you have, some need a GPU you don't own, and some will technically load but generate at a pace that makes them useless for real work.
Before you pull anything, you need to understand what you can run on your system: how much RAM the weights occupy at a given quantization, how much extra the KV cache eats as context grows, what your CPU or GPU can actually push through, and where the bottleneck lives on your specific machine. Get this right and you stop wasting bandwidth on downloads you'll delete an hour later.
There's a tool to help you do this: llm-checker.
# Download and install nvm:
curl -o- \
https://raw.githubusercontent.com/nvm-sh/nvm/v0.40.4/install.sh | \
bash
# in lieu of restarting the shell
\. "$HOME/.nvm/nvm.sh"
# Download and install Node.js:
nvm install 24
Local AI Engineering with Ollama
Run, understand, customize, fine-tune, and build agentic apps on your own hardwareEnroll now to unlock all content and receive all future updates for free.
