Running Your Fine-Tuned Model in Ollama
69%
Step 1: Export to GGUF
Right now your model is base-plus-adapter: the frozen Granite base, with your small adapter layered on top. To run anywhere outside the training tool, those two have to become one single model. That joining is called merging.
After merging, the model still has to be converted to GGUF, the file format Ollama uses to run models.
Unsloth does both steps, merge and convert, in one call:
# export_gguf.py
from unsloth import FastLanguageModel
# Reload the base model with your trained adapter on top.
# We point at the adapter folder, not the original model name.
# Unsloth reads the config inside it, fetches the matching base model,
# and reattaches your trained adapter automatically.
model, tokenizer = FastLanguageModel.from_pretrained(
# the folder you saved your adapter in
model_name = "granite_sql_lora",
# same length used during training
max_seq_length = 2048,
# load the compressed base, as in training
load_in_4bit = True,
)
# Merge the adapter into the base and convert to GGUF in one call.
model.save_pretrained_gguf(
# output folder for the .gguf fileLocal AI Engineering with Ollama
Run, understand, customize, fine-tune, and build agentic apps on your own hardwareEnroll now to unlock all content and receive all future updates for free.
