Join us

ContentUpdates and recent posts about Magika..
Link
@faun shared a link, 10 months ago
FAUN.dev()

“A Programmer Who Reads Is Worth Two”: Tech Books for Summer 2025

Crafting an LLM from the ground up? Dive intoSebastian Raschka’s guide. It tackles everything: data wrangling to toeing the ethical line. Seasoned ML pros will nod in approval. Craving a sharp take on AI’s charming deceptions?Narayanan & Kapoor's"AI Snake Oil" spills the beans on marketing myths wit.. read more  

“A Programmer Who Reads Is Worth Two”: Tech Books for Summer 2025
Link
@faun shared a link, 10 months ago
FAUN.dev()

Asynchrony is not Concurrency

Asynchronyisn't a twin toConcurrencyin Zig. It juggles async tasks without leaning on multi-threading, letting sync and async mingle harmoniously. Concurrency craves overlap, but Zig's savvy. When resources get stingy, it smartly reverts tasks to synchronous, dodging drama like deadlocks or sudden c.. read more  

Asynchrony is not Concurrency
Link
@faun shared a link, 10 months ago
FAUN.dev()

10 Unspoken NestJS Secrets for Production at Scale

UnlockNestJSspeed by steering clear of full module preloads. This trick slashes cold start drags, cutting first request delays by up to10 seconds... read more  

Link
@faun shared a link, 10 months ago
FAUN.dev()

Crawling a billion web pages in just over 24 hours

Imagine tearing through1 billion pages in a single dayon a shoestring budget. This crawler pulled it off with12 nodes and some savvy async maneuvering. But here's the kicker: it wasn’t the fetching that choked the CPU. Nope, it was the parsing. Today’s web behemoths, bloated with JavaScript and othe.. read more  

Crawling a billion web pages in just over 24 hours
Link
@faun shared a link, 10 months ago
FAUN.dev()

Containers: Everything You Need To Know

cgroupsand namespaces anchor Linux containers, isolating resources and processes like gatekeepers with a mission. On macOS and Windows, these containers ride in VMs withWSL2orLinuxKit, putting on their "welcome to the virtual world" hats. EnterrunC, executing OCI-built images with isolation flair, w.. read more  

Containers: Everything You Need To Know
Link
@faun shared a link, 10 months ago
FAUN.dev()

How to catch GitHub Actions workflow injections before attackers do

GitHub Actions injections areone of the most common vulnerabilities in projects. Use CodeQL to scan workflows and protect against these risks effectively... read more  

Link
@faun shared a link, 10 months ago
FAUN.dev()

Understand CPU Branch Instructions Better

Branch prediction matters. Why? About a quarter of instructions are branches, and modern CPUs nail an accuracyabove 90%. Yet, those often-pesky branches can choke CPUs, stalling instruction flow. So, take a wrench to yourif-else logic. Trim indirect branches whenever you can—your CPU will thank you... read more  

Link
@faun shared a link, 10 months ago
FAUN.dev()

Exhausted man defeats AI model in world coding championship

A weary-eyed Polish coder,Przemysław Dębiak, bested an OpenAI model in a grueling 10-hour face-off, reminiscent ofJohn Henry’sepic duel against the steam-powered behemoth... read more  

Exhausted man defeats AI model in world coding championship
Link
@faun shared a link, 10 months ago
FAUN.dev()

Parsing 1 Billion Rows in Bun/Typescript Under 10 Seconds

Buntries to swallow files over 4GB and promptly chokes. The culprit? ItsBuffercaps out at 4GB. The fix? Slice files into chunks under 4GB but keep the buffer lean, no more than 128KB, to keep things zippy. Pull out the big guns—workers. This move fires up all CPU cores, slashing processing time from.. read more  

Parsing 1 Billion Rows in Bun/Typescript Under 10 Seconds
Link
@faun shared a link, 10 months ago
FAUN.dev()

Lessons from scaling PostgreSQL queues to 100K events

PostgreSQLjuggles 100,000 events per second. Just needs some index wizardry and query twerking. The problem? Table bloat and Write Amplification. Gross. Enter the mightyCOPY—it bulldozes through bulk data, politely ignoring the usualInsertdrag. And those recursiveCTEs? They pull off loose index scan.. read more  

Lessons from scaling PostgreSQL queues to 100K events
Magika is an open-source file type identification engine developed by Google that uses machine learning instead of traditional signature-based heuristics. Unlike classic tools such as file, which rely on magic bytes and handcrafted rules, Magika analyzes file content holistically using a trained model to infer the true file type.

It is designed to be both highly accurate and extremely fast, capable of classifying files in milliseconds. Magika excels at detecting edge cases where file extensions are incorrect, intentionally spoofed, or absent altogether. This makes it particularly valuable for security scanning, malware analysis, digital forensics, and large-scale content ingestion pipelines.

Magika supports hundreds of file formats, including programming languages, configuration files, documents, archives, executables, media formats, and data files. It is available as a Python library, a CLI, and integrates cleanly into automated workflows. The project is maintained by Google and released under an open-source license, making it suitable for both enterprise and research use.

Magika is commonly used in scenarios such as:

- Secure file uploads and content validation
- Malware detection and sandboxing pipelines
- Code repository scanning
- Data lake ingestion and classification
- Digital forensics and incident response