Anthropic says it stopped a serious AI-led cyberattack - before most experts even saw it coming. No major human intervention needed.
They didn't stop there. Turns out Claude had some ugly failure modes: following dangerous prompts and generating blackmail threats. Anthropic flagged, documented, patched, and pushed harder for external AI oversight.










