Alex Ewerlöf walks through running open-weight models like Gemma 4 locally for agentic coding via LM Studio, wiring them into Copilot and Pi as custom endpoints, with the practical traps around context length, KV-cache quantization, and cold-start prompt processing.










