Microsoft’s Azure AI Foundry just released a proper workflow for putting LLMs through their paces. Think offline/online tests, human-in-the-loop checks, automated scoring, and even custom evaluators—all wired into one system.
At the heart of it: the new Azure AI Evaluation SDK. You can run it locally while prototyping or scale it up in the cloud. It doesn’t just spit out metrics—it tracks safety, quality, and business impact through prod-ready pipelines.