Feedback

Chat Icon

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Running Your Fine-Tuned Model in Ollama
69%

Step 1: Export to GGUF

Right now your model is base-plus-adapter: the frozen Granite base, with your small adapter layered on top. To run anywhere outside the training tool, those two have to become one single model. That joining is called merging.

After merging, the model still has to be converted to GGUF, the file format Ollama uses to run models.

Unsloth does both steps, merge and convert, in one call:

# export_gguf.py
from unsloth import FastLanguageModel

# Reload the base model with your trained adapter on top.
# We point at the adapter folder, not the original model name.
# Unsloth reads the config inside it, fetches the matching base model,
# and reattaches your trained adapter automatically.
model, tokenizer = FastLanguageModel.from_pretrained(
    # the folder you saved your adapter in
    model_name = "granite_sql_lora",   

    # same length used during training
    max_seq_length = 2048,             

    # load the compressed base, as in training
    load_in_4bit = True,               
)

# Merge the adapter into the base and convert to GGUF in one call.
model.save_pretrained_gguf(

    # output folder for the .gguf file

Local AI Engineering with Ollama

Run, understand, customize, fine-tune, and build agentic apps on your own hardware

Enroll now to unlock all content and receive all future updates for free.