In-Context Pretraining: Language Modeling Beyond Document Boundaries
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Paloma: A Benchmark for Evaluating Language Model Fit
LLaVA: Large Language and Vision Assistant
Training and inference of large language models using 8-bit floating point
MegaBlocks: Efficient Sparse Training with Mixture-of-Experts
Zero Bubble Pipeline Parallelism
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension