Join us

Evaluating AI Agents in Security Operations

@kala ・ Dec 22,2025

Evaluating AI Agents in Security Operations

Cotool threw frontier LLMs at real-world SecOps tasks using Splunk’s BOTSv3 dataset. GPT-5 topped the chart in accuracy (62.7%) and gave the best results per dollar. Claude Haiku-4.5 blazed through tasks fastest, just 240 seconds on average, maxing out tool integrations. Gemini-2.5-pro flopped on both accuracy and reliability, with repeat failures.

Give a Pawfive to this post!

Only registered users can post comments. Please, login or signup.

Share with your friends and followers

Start writing about what excites you in tech — connect with developers, grow your voice, and get rewarded.

Join other developers and claim your FAUN.dev() account now!

Publish your first story!

Kala #GenAI

FAUN.dev()

Generative AI Weekly Newsletter, Kala. Curated GenAI news, tutorials, tools and more!

Developer Influence

21

Influence

1

Total Hits

149

Posts

Join and showcase your work and skills