Feedback

Chat Icon

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Picking and Pulling Models
27%

Where to Find Models

Ollama's Official Library

The main place to browse Ollama-ready models is the Ollama library. Each model page shows the available tags, the approximate download size, the context window, the license, and a ready-to-copy pull command.

For example, the Granite page is:

https://ollama.com/library/granite3.3

Tags are the important part. A model name like llama3.2 or qwen2.5-coder is only the family. The tag chooses the specific variant:

ollama pull llama3.2:3b
ollama pull qwen2.5-coder:7b
ollama pull mistral:7b

When choosing a tag, look at 3 things before you pull:

  • Parameter count. Bigger models usually reason better, but need more RAM or VRAM and run slower.

  • Quantization and size. Smaller downloads usually mean more compression. That saves memory, but can reduce quality.

  • License. Some models are easy to use commercially; others have custom terms you need to read before shipping.

Once you have pulled a few models, ollama list is your local inventory.

(i) Use the library to discover models, llm-checker to sanity-check what your hardware can handle, and explicit tags to avoid accidentally downloading whatever :latest happens to mean that day.

Hugging Face GGUF Repos

Anything on Hugging Face in GGUF format runs in Ollama directly, without any prior conversion.

The HF search interface is heavily filterable. The filters that matter when hunting for models to run locally:

Tasks: what the model is trained to do. The main ones you'll use:

  • text-generation: standard chat and completion models (Llama, Qwen, Mistral)
  • image-text-to-text: vision-language models that take an image plus a prompt and respond in text (Gemma 3, Qwen2.5-VL, LLaVA)
  • text-to-image: diffusion models like Stable Diffusion. These do not run in Ollama.
  • automatic-speech-recognition: Whisper and similar. Also not Ollama.
  • feature-extraction: embedding models for RAG (nomic-embed, bge, e5)
  • There are also multiple other tasks.

Parameters: already covered. However, HF's auto-detected size is unreliable for some model architectures. Treat the size widget as a hint, not a fact.

Libraries: the file format and framework the weights ship in. Ollama needs GGUF (the format llama.cpp uses); however, there are other formats (these formats will not load in Ollama without conversion):

  • pytorch, safetensors, transformers: raw weights for the HF Transformers library.
  • mlx: Apple Silicon native format for the MLX framework.
  • tensorflow, jax: Google ecosystems.

A model often has multiple library tags because it's published in several formats.

Apps: which local runtimes the model has been tested with.

  • ollama: confirmed to pull and run via ollama run hf.co/$REPO
  • llama.cpp: works with raw llama-cli or llama-server
  • lm-studio, jan: GUI runtimes that wrap llama.cpp
  • mlx-lm

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.