Working with the Model Library
Inspecting Models
Using the CLI
ollama show prints the model's metadata: architecture, parameter count, context length, quantization, license, and the system prompt baked in (if any).
# this is the model we used
export MODEL="granite3.3:2b"
# print the model's metadata
ollama show $MODEL
Example output:
Model
architecture granite
parameters 2.5B
context length 131072
embedding length 2048
quantization Q4_K_M
Capabilities
completion
tools
License
Apache License
Version 2.0, January 2004
Read this before you commit to a model. Each field tells you something that affects how the model behaves, what it can do, and whether it fits on your hardware.
architectureis the model family the weights belong to:granite,llama,qwen2,mistral,gemma, and so on. This matters because different architectures have different prompt templates, tokenizers, and quirks. Ollama handles this for you when you useollama run, but it's relevant if you're calling the raw API or importing a GGUF file by hand.parametersis the model's size in number of weights (2.5Bmeans 2.5 billion parameters). This is the headline number for capability and resource cost: more parameters generally means smarter responses, but also more disk space, more memory, and slower inference.context lengthis the maximum amount of text the model can process in one go, measured in tokens.131072means 131,072 tokens, or roughly 128K, which works out to around 98,000 words of English text (if we consider that a token averages about 0.75 of a word).embedding lengthis the size of the internal vector the model uses to represent each token, sometimes called the hidden dimension.2048here means every token gets turned into a 2048-number vector as it flows through the model. You don't set this and you don't tune it: it's fixed by the architecture. It's mostly useful when you're building advanced tools like semantic search or RAG using this model.quantizationis how aggressively the weights were compressed from their original precision.Capabilitieslists what the model can do beyond plain text generation. You'll findcompletion,tools,vision,embedding, andthinkingat this point. If a capability isn't listed, the model wasn't trained for it; bolting it on at runtime usually doesn't work.Licenseis the legal terms attached to the model. This matters more than people think. Apache 2.0 and MIT let you use the model commercially with minimal restrictions. Llama's license has acceptable-use clauses and a 700M monthly active user threshold. Some "open" models (Gemma, certain Mistral variants) have custom licenses with their own conditions. If you're shipping a product, read this before you ship.
Some other useful flags we can use with ollama show to get more details about the model include:
# the Modelfile used to build this model
ollama show $MODEL --modelfile
# default sampling parameters
ollama show $MODEL --parameters
# the prompt template
ollama show $MODEL --template
# the system prompt, if set
ollama show $MODEL --system
(i) Reminder: A Modelfile is Ollama's recipe for building a model. It's a plain text file, similar in spirit to a Dockerfile. We'll cover this.
Using the API
The API equivalent is /api/show:
# Choose a model
export MODEL=llama3.2:3b
# Pull the model if you haven't already
ollama pull $MODEL
# Show details
curl -s http://localhost:11434/api/show \
-d "{\"model\": \"$MODEL\"}" | jq .
The response is large, but the top-level keys you'll care about are:
{
"license": "Apache License...",
"modelfile": "# Modelfile generated by ...",
"parameters": "stop \"<|end_of_text|>\"\nstop \"<|eom_id|>\"",
"template": "{{ .System }}{{ .Prompt }}",
"system": "",
"details": {
"parent_model": "",
"format": "gguf",
"family": "granite",
"families": ["granite"],
"parameter_size": "2.5B",
"quantization_level": "Q4_K_M"
},
"model_info": {
"general.architecture": "granite",
"general.parameter_count": 2533539840,
"granite.context_length": 131072,
"granite.embedding_length": 2048,
"granite.attention.head_count": 32,
"granite.attention.head_count_kv": 8,
"tokenizer.ggml.model": "gpt2"
},
"capabilities": ["completion", "tools"]
}
Where each ollama show field lives in the JSON:
| CLI output | API path |
|---|
Local AI Engineering with Ollama
Run, understand, customize, fine-tune, and build agentic apps on your own hardwareEnroll now to unlock all content and receive all future updates for free.
