--- license: apache-2.0 datasets: - databricks/dolly-v2-12b metrics: - accuracy base_model: - meta-llama/Meta-Llama-3.1-8B-Instruct --- ## Usage Support for this model will be added in the upcoming transformers release. In the meantime, please install the library from source: ~~~ pip install transformers ~~~ We can now run inference on this model: ~~~ import torch from transformers import AutoTokenizer, AutoModelForCausalLM # Load the tokenizer and model model_path = "YaoLuzjut/Llama-3.1-7.2B-It-Dolly" tokenizer = AutoTokenizer.from_pretrained(model_path) device = 'cuda' dtype = torch.bfloat16 model = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=dtype, device_map=device) # Prepare the input text prompt = 'Complete the paragraph: our solar system is' inputs = tokenizer.encode(prompt, return_tensors='pt').to(model.device) # Generate the output outputs = model.generate(inputs, max_length=20) # Decode and print the output output_text = tokenizer.decode(outputs[0]) print(output_text) ~~~ ## Evaluation Results Zero-shot performance. Evaluated using select datasets from the [LM Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness/tree/main) with additions: | PIQA | HellaSwag | OpenbookQA | ARC-e | ARC-c | MMLU | CMMLU | WinoGrande | | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | ----------- | | 0.7709±0.0098 | 0.5541±0.0050 | 0.3000±0.0205 | 0.7424±0.0090 | 0.4838±0.0146 | 0.6753±0.0038 | 0.5522±0.0045 | 0.7032±0.0128 | ~~~ @article{lu2024reassessing, title={Reassessing Layer Pruning in LLMs: New Insights and Methods}, author={Lu, Yao and Cheng, Hao and Fang, Yujie and Wang, Zeyu and Wei, Jiaheng and Xu, Dongwei and Xuan, Qi and Yang, Xiaoniu and Zhu, Zhaowei}, journal={arXiv preprint arXiv:2411.15558}, year={2024} } ~~~