SourceHut spent 20–100% of weekly time mitigating hyper‑aggressive LLM crawlers. That work caused dozens of short outages and delayed core projects.
The crawlers ignore robots.txt. They hit costly endpoints like git blame. They scan full git logs and commits. They rotate random User‑Agents and thousands of residential IPs to blend in and evade mitigations.
Trend to watch: Large LLM crawlers that disregard robots.txt and mimic user traffic are shifting scraping tactics. That shift piles ongoing costs onto small forges.









