Difference between This and togethercomputer's

#1
by tsteffek - opened

Hi, I'm currently wondering which version to use for document classification on medical texts and stumbled upon this version and https://huggingface.co/togethercomputer/m2-bert-80M-32k. In a github issue Daniel Fu mentions that V1 has seen legal texts, so there seems to be some difference, but I couldn't find a comprehensive list. Can I see that somewhere?

When trying the 2 models I also noticed that this version does indeed have the FlashFFT warnings mentioned in the github, while the other one doesn't. So is only one of them using FlashFFT? (Can these be safely ignored when fine-tuning further?)

Sign up or log in to comment