Authors discuss their approach to scaling generative recommendation models from O(1M) to O(1B) parameters for Netflix tasks, improving training stability, computational efficiency, and evaluation methodology. They address challenges in alignment, cold-start adaptation, and deployment, proposing systematic strategies like multi-token prediction and efficient decoding to optimize performance. Their work offers insights into scaling laws, efficiency in training and inference, and the benefits of multi-token prediction for large-scale generative recommenders.









