dawg this is not a pretrained model.
#3
by
Dampish
- opened
how do you derive a model and call it pretrained
Basically, MiniMA-3B is distilled from LLaMA2-7B on subsampled Pile, GitHub, and WuDao data, totally 100+B tokens. In terms of data scale, MiniMA-3B is somehow a (continuously) pretrained model.
And you are perhaps referring MiniChat-3B (https://huggingface.co/GeneZC/MiniChat-3B) which is MiniMA-3B finetuned on instruction data and indeed a finetuned model.
GeneZC
changed discussion status to
closed