DoReMi | Notion

Key results

DoReMi improves average downstream accuracy over a baseline model trained on The Pile’s default domain weights by 6.5% points on generative few-shot tasks and achieves the baseline downstream accuracy 2.6x faster
Iterative DoReMi
- Running it for multiple rounds, setting the reference domain weights αref for the next round to be α ̄ from the previous round
- Stop iterating if $\left\|\bar{\alpha}-\alpha_{\text {ref }}\right\|_{\infty}$
Evaluation
- TriviaQA
- NaturalQuestions
- WebQuestions
- SQuADv2
- LAMBADA
Other notes
- Despite the relatively poor quality of the 1B proxy model, the domain weights still allow the 1B main model to achieve the baseline performance over 2x faster
wikicorpus
try ia different batch size for the overall ratio
run the training again to see how many idxs lefts after each iteartion (small)
if everything not works, try doremi sampler
ask for help
in a batch, take the size of a 4 microbatches
if run_dataloader works, then maybe about the state of sampler, weh engine passing around
check what is the one that yield stop in the training
increase/decrease one sample in the large portion