Key results
DoReMi improves average downstream accuracy over a baseline model trained on The Pile’s default domain weights by 6.5% points on generative few-shot tasks and achieves the baseline downstream accuracy 2.6x faster
Iterative DoReMi
Evaluation
Other notes
wikicorpus
try ia different batch size for the overall ratio
run the training again to see how many idxs lefts after each iteartion (small)
if everything not works, try doremi sampler
ask for help
in a batch, take the size of a 4 microbatches
if run_dataloader works, then maybe about the state of sampler, weh engine passing around
check what is the one that yield stop in the training
increase/decrease one sample in the large portion