Core Concepts: From Tokens and Embeddings to Quantization and KV Cache
Embeddings
Think of embeddings as "a giant lookup book".
The model has one row per token in its vocabulary. Each row holds a long list of numbers that describes the meaning of that token in a way the model can do math on.
When you type unbelievable things happen, 3 things happen:
Step 1: Chop the text into tokens:
A tokenizer splits your sentence into small pieces. Whole words when they're common, smaller chunks when they're not. Unbelievable is long enough that the tokenizer often breaks it up:
["un", "believ", "able", " things", " happen"]
This is why you'll sometimes see a 3-word sentence become 5 or 6 tokens. The model doesn't have every word in its vocabulary, so it builds rare words from familiar pieces.
Step 2: Look up each token's row number:
Every token has a fixed row number in "the model's book", called a token ID. The tokenizer doesn't think, it just looks up the number.
"un" -> 515
"believ" -> 67473
"able" -> 481Local AI Engineering with Ollama
Run, understand, customize, fine-tune, and build agentic apps on your own hardwareEnroll now to unlock all content and receive all future updates for free.
