DoReMi

In-Context Pretraining: Language Modeling Beyond Document Boundaries

HiPPO: Recurrent Memory with Optimal Polynomial Projections

slurm

Paloma: A Benchmark for Evaluating Language Model Fit

LLaVA: Large Language and Vision Assistant

Training and inference of large language models using 8-bit floating point

bitsandbytes

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

Zero Bubble Pipeline Parallelism

MegaScale

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

SQuAD: 100,000+ Questions for Machine Comprehension of Text