PhoneLM
Collection
A highly capable and efficient small language model family
•
7 items
•
Updated
PhoneLM-1.5B is a 1.5 billion parameter decoder-only language model pre-trained on 1.1 trillion tokens.
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = 'mllmTeam/PhoneLM-1.5B'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map='cuda', trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inp = tokenizer("Machine Learning is ", return_tensors="pt")
inp = {k: v.to('cuda') for k, v in inp.items()}
out = model.generate(**inp,
max_length=256,
do_sample=True,
temperature=0.7,
top_p=0.7
)
text = tokenizer.decode(out[0], skip_special_tokens=True)
print(text)
PhoneLM 1.5B
models are auto-regressive language models based on the transformer decoder architecture.The model is a decoder-only transformer architecture with the following modifications:
Hidden Size | Layers | Heads | Sequence Length |
---|---|---|---|
2560 | 19 | 16 | 2048 |
The training dataset PhoneLM used is comprised of a filtered mixture of open-source large-scale datasets available on the HuggingFace Hub: DCLM-baseline(Li et al., 2024), StarCoder (Li et al., 2023), OpenWebMath (Paster et al., 2023) and Dolma (Soldaini et al., 2024).
Model | HellaSwag | WinoGrande | PIQA | SciQ | BoolQ | ARC Easy | ARC Challenge | Average |
---|---|---|---|---|---|---|---|---|
PhoneLM-1.5B | 66.9 | 63.0 | 77.3 | 88.8 | 65.5 | 69.7 | 39.9 | 67.31 |
Pythia-1.4B | 52.0 | 57.2 | 71.1 | 79.2 | 63.2 | 53.9 | 28.3 | 57.84 |
OPT-1.3B | 53.7 | 59.0 | 71.0 | 78.1 | 57.2 | 51.3 | 28.0 | 56.90 |
BLOOM-1.1B | 43.0 | 54.9 | 67.2 | 74.6 | 59.1 | 45.4 | 25.6 | 52.83 |
TinyLlama-1.1B | 59.1 | 58.9 | 73.0 | 82.3 | 58.6 | 55.7 | 31.0 | 59.80 |
MobileLLaMA-1.4B | 56.1 | 59.4 | 73.0 | 81.9 | 56.7 | 55.8 | 30.3 | 59.03 |
MobiLlama-1B | 62.2 | 59.3 | 74.8 | 82.8 | 60.3 | 56.4 | 31.7 | 61.07 |
OpenELM-1.1B | 64.8 | 61.7 | 75.6 | 83.6 | 63.6 | 55.4 | 32.3 | 62.43 |
DCLM-1.4B | 53.6 | 66.3 | 77.0 | 94.0 | 71.4 | 74.8 | 41.2 | 68.33 |
SmolLM-1.7B | 49.6 | 60.9 | 75.8 | 93.2 | 66.0 | 76.4 | 43.5 | 66.49 |
Qwen 1.5-1.8B | 60.9 | 60.5 | 74.2 | 89.4 | 66.5 | 59.1 | 34.7 | 63.61 |
Galactica-1.3B | 41.0 | 54.4 | 63.8 | 87.7 | 62.0 | 58.6 | 30.5 | 56.86 |
StableLM 2-1.6B | 68.8 | 64.1 | 75.1 | 76.9 | 80.0 | 60.3 | 39.2 | 66.34 |
Cerebras-GPT-1.3B | 38.4 | 51.9 | 66.8 | 73.0 | 59.3 | 45.8 | 25.3 | 51.50 |
MiniCPM-1B | 67.5 | 63.7 | 75.1 | 91.0 | 70.5 | 62.9 | 38.1 | 66.97 |
MiniCPM-2B | 67.2 | 63.9 | 76.1 | 92.5 | 74.6 | 69.0 | 42.7 | 69.43 |
Gemma-2B | 71.4 | 65.2 | 78.4 | 91.4 | 69.9 | 72.3 | 42.0 | 70.09 |
Gemma 2-2B | 55.0 | 68.7 | 78.7 | 96.0 | 73.6 | 80.3 | 46.9 | 71.31 |
@misc{yi2024phonelmanefficientcapablesmall,
title={PhoneLM:an Efficient and Capable Small Language Model Family through Principled Pre-training},
author={Rongjie Yi and Xiang Li and Weikai Xie and Zhenyan Lu and Chenghua Wang and Ao Zhou and Shangguang Wang and Xiwen Zhang and Mengwei Xu},
year={2024},
eprint={2411.05046},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2411.05046},
}