Croqaz shows how he built Vintage LLM, a Llama-style model trained on English books, newspapers, and other texts published before 1900. He covers corpus selection, cleaning, tokenizer choices, training setup, evaluation, and how pre-20th-century English affects model behavior.










