Transformer Models

A transformer is the kind of AI model behind almost every chatbot and language tool you've used, including ChatGPT, Claude, Gemini, and every model we'll run with Ollama.

When people say "large language model" or "LLM", they almost always mean a transformer. The name GPT itself is the abbreviation for "Generative Pre-trained Transformer", which is a fancy way of saying "a transformer that can write text".

The name comes from a 2017 research paper called "Attention Is All You Need", written by a team at Google. Before that paper, AI systems that worked with language were slower, smaller, and noticeably worse at holding a conversation or writing coherent text.

The transformer is a neural network architecture that changed all that. It gave researchers a design that could be trained on enormous amounts of text, scaled up to billions of parameters, and still run fast enough to be useful. Every major AI breakthrough in language since then, GPT, Llama, Claude, Gemini, and DeepSeek, has been built on this same foundation or on an improved variant of it.

What a transformer actually does is simple to describe: you give it some text, and it predicts what word (or piece of a word) should come next. Then it does it again, and again, one piece at a time, until it has produced a full answer. That's the entire trick. Everything impressive these models do, writing code, summarizing documents, answering questions, comes from getting very, very good at this one prediction task, trained on a huge slice of the internet and books.

The decoder is the part that generates the output, one token at a time. It takes the embeddings of your input and produces tokens in a loop: predict one, feed it back in, predict the next, until it's done.

A simplified diagram of a transformer

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.

Unlock now $26.99 Learn More

Previous Next