Building a Management CLI for Ollama Using the SDK
Setup and Requirements
Configure Host and Model with .env
The SDK reads OLLAMA_HOST from the environment by default. We add OLLAMA_MODEL ourselves so we don't hardcode a model name in every script. Both go in a .env file at the project root.
Copy the .env.example file from the companion repo:
# Change directory
cd $HOME/companion/code/ollama-cli
# Copy the file
cp .env.example .env
Update the .env file with the host and model name. Use http://localhost:11434 if you're testing locally (you'll be running the code on the same machine where Ollama is running):
# ollama-cli/.env
OLLAMA_HOST=http://localhost:11434
OLLAMA_MODEL=granite3.3:2b
If you're testing on a remote server, replace localhost with the IP address or hostname. Don't forget to also configure Ollama to use the IP address or hostname.
Create a directory for overrides:
mkdir -p /etc/systemd/system/ollama.service.d/
Update the unit file:
cat > /etc/systemd/system/ollama.service.d/override.conf <<'EOF'
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_ORIGINS=*"
EOF
(w) This exposes Ollama to the network with no authentication. Don't use it in production.
OLLAMA_HOST=0.0.0.0 binds to every interface, and OLLAMA_ORIGINS=* drops the cross-origin check. Ollama ships no auth of its own, so anyone who can reach port 11434 can list your models, run inference, and pull new ones, on your hardware. For anything beyond localhost, put Ollama behind a reverse proxy that handles auth and TLS. The minimum is HTTP Basic Auth behind nginx (auth_basic with an htpasswd file); Caddy is a simpler alternative that terminates TLS automatically. Keep Ollama itself bound to 127.0.0.1 and let the proxy be the only thing listening publicly.
Reload and restart Ollama:
# Reload
systemctl daemon-reload
# Restart
systemctl restart ollama
Our config file reads the variables exported in the .env file; otherwise it defaults to http://localhost:11434 and granite3.3:2b.
# ollama-cli/config.py
# config.py
"""Central config. Loads .env once, exposes constants.
Real environment variables win over .env values, so a shell `export
OLLAMA_HOST=...` overrides whatever is in the file. That's what you
want when switching between local and a remote server.
"""
import os
from dotenv import load_dotenv
# override=False means real env vars beat the .env file.
load_dotenv(override=False)
OLLAMA_HOST: str = os.environ.get(
"OLLAMA_HOST",
"http://localhost:11434"
)
OLLAMA_MODEL: str = os.environ.get(
"OLLAMA_MODEL",
"granite3.3:2b"
)
First Call
Let's start with a smoke test.
The following script confirms the SDK can connect to the server, and prints the SDK version, the server host, and how many models are installed locally.
# ollama-cli/smoke.py
"""Confirms the SDK can talk to the local Ollama server.
Run with: uv run smoke.py
"""
# importlib.metadata reads version info from installed package metadata.
# We use this because the `ollama` package does not expose a __version__
# attribute. version("ollama") returns the same string as `pip show ollama`.
from importlib.metadata import version
# Client is the sync HTTP client. ResponseError is what the SDK raises when
# the server returns an HTTP error (404 model not found, 500 server error, etc).
from ollama import Client, ResponseError
# OLLAMA_HOST comes from .env via config.py. Centralizing it means we change
# the host in one place when switching between local and a remote server.
from config import OLLAMA_HOSTLocal AI Engineering with Ollama
Run, understand, customize, fine-tune, and build agentic apps on your own hardwareEnroll now to unlock all content and receive all future updates for free.
