Updates and recent posts about GPT-5.3-Codex..

Posts
Description

Link

@kala shared a link, 4 months, 2 weeks ago

FAUN.dev()

So you wanna build a local RAG?

Skald spun up a full local RAG stack, withpgvector,Sentence Transformers,Docling, andllama.cpp, in under 10 minutes. The thing hums on English point queries. Benchmarks show open-source models and rerankers can go toe-to-toe with SaaS tools in most tasks. They stumble, though, on multilingual prompt.. read more

Link

@kala shared a link, 4 months, 2 weeks ago

FAUN.dev()

Learning Collatz - The Mother of all Rabbit Holes

Researchers trained small transformer models to predict the "long Collatz step," an arithmetic rule for the infamous unsolved Collatz conjecture, achieving surprisingly high accuracy up to 99.8%. The models did not learn the universal algorithm, but instead showed quantized learning, mastering speci.. read more

Link

@kala shared a link, 4 months, 2 weeks ago

FAUN.dev()

200k Tokens Is Plenty

Amp’s team isn’t chasing token limits. Even with ~200k available via Opus 4.5, they stick toshort, modular threads, around 80k tokens each. Why? Smaller threads are cheaper, more stable, and just work better. Instead of stuffing everything into a single mega-context, they slice big tasks into focuse.. read more

Link

@kala shared a link, 4 months, 2 weeks ago

FAUN.dev()

Google tests new Gemini 3 models on LM Arena

Google’s been quietly field-testing two shadow models,Fierce FalconandGhost Falcon, on LM Arena. Early signs? They're probably warm-ups for the next Gemini 3 Flash or Pro drop. Classic Google move: float a checkpoint, stir up curiosity, then go GA... read more

Link

@kala shared a link, 4 months, 2 weeks ago

FAUN.dev()

A trillion dollars is a terrible thing to waste

OpenAI co-founder Ilya Sutskever just said the quiet part out loud: scaling laws are breaking down. Bigger models aren’t getting better at thinking, they’re getting worse at generalizing and reasoning. Now he’s eyeingneurosymbolic AIandinnate inductive constraints. Yep, the “just make it huge” era m.. read more

Link

@kala shared a link, 4 months, 2 weeks ago

FAUN.dev()

Practical LLM Security Advice from the NVIDIA AI Red Team

NVIDIA’s AI Red Team nailed three security sinkholes in LLMs:reckless use ofexec/eval,RAG pipelines that grab too much data, andmarkdown that doesn't get cleaned. These cracks open doors to remote code execution, sneaky prompt injection, and link-based data leaks. The fix-it trend:App security’s lea.. read more

Link

@kala shared a link, 4 months, 2 weeks ago

FAUN.dev()

Roses are red, violets are blue, if you phrase it as poem, any jailbreak will do

A new study just broke the safety game wide open: rhymed prompts slipped past filters in25 major LLMs, including Gemini 2.5 Pro and Deepseek - withup to 100% success. No clever chaining, no jailbreak soup. Just single-shot rhyme. Turns out, poetic language isn’t just for bard-core Twitter. When it c.. read more

Link

@kala shared a link, 4 months, 2 weeks ago

FAUN.dev()

Prompts for Open Problems

The author, Ben Recht, proposes five research directions inspired by his graduate machine learning class, arguing for different research rather than just more. These prompts include adopting a design-based view for decision theory, explaining the robust scaling trends in competitive testing, and mov.. read more

Link

@devopslinks shared a link, 4 months, 2 weeks ago

FAUN.dev()

Why we're leaving serverless

Every millisecond matters in the critical path of API authentication. After two years of battling serverless limitations, the entire API stack was rebuilt to reduce end-to-end latency. The move from Cloudflare Workers to stateful Go servers resulted in a 6x performance improvement and simplified arc.. read more

Link

@devopslinks shared a link, 4 months, 2 weeks ago

FAUN.dev()

Advancing Our Chef Infrastructure: Safety Without Disruption

Slack pulled back the curtain onSlack AI, its LLM-powered assistant built with a fortress mindset. Every customer gets their ownisolated environment. Any data passed tovendor LLMs? It'sephemeral. Gone before it can stick. No fine-tuning. No exporting data outside Slack. And there’s a wholemiddle-lay.. read more

GPT-5.3-Codex is OpenAI’s advanced agentic coding model, designed to go beyond writing code and operate as a general-purpose collaborator on a computer. It builds on GPT-5.2-Codex by combining stronger coding performance with improved reasoning and professional knowledge, while running about 25% faster. The model is optimized for long-running tasks that involve research, tool use, and complex execution, and it performs at the top of industry benchmarks such as SWE-Bench Pro and Terminal-Bench.

Unlike earlier Codex models that focused primarily on code generation and review, GPT-5.3-Codex can reason, plan, and act across the full software lifecycle. It supports activities such as debugging, deploying, monitoring, writing product requirement documents, creating tests, and analyzing metrics. It can also autonomously build and iterate on complex applications and better interpret underspecified prompts, producing more complete and production-ready results by default.

A defining feature of GPT-5.3-Codex is its interactive, agentic workflow. Users can steer the model while it is working, receive progress updates, and adjust direction without losing context, making it feel more like a teammate than a batch automation tool. The model was even used internally to help debug its own training and deployment processes. GPT-5.3-Codex is available through paid ChatGPT plans in the Codex app, CLI, IDE extension, and web, with API access planned for the future.