BUT-FIT
/

csmpt7b

@@ -37,7 +37,7 @@ The model was trained on 3 corpora, which were hot-swapped during the training.
 <img src="figures/tloss_full.png"  width="900"/>
 Figure 1: Training loss.
 <img src="figures/tloss_closeup.png" width="900"/>
-Figure 2: Training loss closeup. We mark two hotswap places, where the training corpus #1 was switched for internal-corpus #2 and internal-corpus #2.1 respectively.
 Additionaly, we perform  two ablations:

 <img src="figures/tloss_full.png"  width="900"/>
 Figure 1: Training loss.
 <img src="figures/tloss_closeup.png" width="900"/>
+Figure 2: Training loss closeup. We mark two hotswap places, where the training corpus #1 was switched for internal-corpus #2 and internal-corpus #2.1 respectively. The flat region between 112k steps and 119.5k steps is caused by missing data---due to an accident, we lost these logs.
 Additionaly, we perform  two ablations: