Join us

Reasoning models struggle to control their chains of thought, and that’s good

@kala ・ Mar 08,2026

OpenAI's paper unveils CoT-Control: an open-source suite of 13,000+ tasks from GPQA, MMLU-Pro, HLE, BFCL that measures CoT controllability.

Evaluations on 13 models show compliance at 0.1%-15.4%. Compliance is tiny.

Controllability improves with model size. It drops as reasoning chains lengthen and after post-training updates.

Give a Pawfive to this post!

Only registered users can post comments. Please, login or signup.

Share with your friends and followers

Start writing about what excites you in tech — connect with developers, grow your voice, and get rewarded.

Join other developers and claim your FAUN.dev() account now!

Publish your first story!

Kala #GenAI

FAUN.dev()

@kala

Generative AI Weekly Newsletter, Kala. Curated GenAI news, tutorials, tools and more!

Developer Influence

30

Influence

1

Total Hits

138

Posts

Join and showcase your work and skills

FAUN.dev() is where engineers from GitHub, Netflix, and Shopify go to stay ahead — fast.