Running **open-weight LLMs locally on macOS**? This post breaks it down clean. It compares **llama.cpp**—great for tweaking things—to **LM Studio**, which trades control for simplicity. Covers what fits in memory, which quantized models to grab (hint: 4-bit GGUF), and what’s coming down the pipe: **reasoning**, **tool use**, and **Mixture-of-Experts (MoE)**. **Bigger picture:** Local runtimes with tool calling and MoE point to where AI’s headed: cheaper, private, and piecemeal—running right on your laptop.