News
The researchers compared two versions of the OLMo-1B model, one trained on 2.3 trillion tokens and another on 3 trillion. Despite the larger training set, the more extensively trained model reportedly ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results