Token generation speed is constrained by the speed at which the relevant HBM can be read, depending on model size and pipeline setup. Model sizes feasible for each year between 2023 and 2031 range from 10T in 2026 to 1.4 quadrillion in 2031, with pretraining compute and HBM specifications playing essential roles. Constraints on total params and active params from pretraining compute are key factors in determining model feasibility for each year.









