Join us

Reasoning models struggle to control their chains of thought, and that’s good

Reasoning models struggle to control their chains of thought, and that’s good

OpenAI's paper unveils CoT-Control: an open-source suite of 13,000+ tasks from GPQA, MMLU-Pro, HLE, BFCL that measures CoT controllability.

Evaluations on 13 models show compliance at 0.1%-15.4%. Compliance is tiny.

Controllability improves with model size. It drops as reasoning chains lengthen and after post-training updates.


Give a Pawfive to this post!


Only registered users can post comments. Please, login or signup.

Start writing about what excites you in tech — connect with developers, grow your voice, and get rewarded.

Join other developers and claim your FAUN.dev() account now!

Avatar

Kala #GenAI

FAUN.dev()

@kala
Generative AI Weekly Newsletter, Kala. Curated GenAI news, tutorials, tools and more!
Developer Influence
30

Influence

1

Total Hits

138

Posts