--- license: apache-2.0 library_name: transformers base_model: - openchat/openchat-3.5-0106 datasets: - Yukang/LongAlpaca-12k model-index: - name: OpenChat-3.5-0106_32K-PoSE results: - task: type: text-generation name: Text Generation dataset: name: IFEval (0-Shot) type: HuggingFaceH4/ifeval args: num_few_shot: 0 metrics: - type: inst_level_strict_acc and prompt_level_strict_acc value: 39.69 name: strict accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: BBH (3-Shot) type: BBH args: num_few_shot: 3 metrics: - type: acc_norm value: 8.83 name: normalized accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MATH Lvl 5 (4-Shot) type: hendrycks/competition_math args: num_few_shot: 4 metrics: - type: exact_match value: 1.44 name: exact match source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: GPQA (0-shot) type: Idavidrein/gpqa args: num_few_shot: 0 metrics: - type: acc_norm value: 3.47 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MuSR (0-shot) type: TAUR-Lab/MuSR args: num_few_shot: 0 metrics: - type: acc_norm value: 11.33 name: acc_norm source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE name: Open LLM Leaderboard - task: type: text-generation name: Text Generation dataset: name: MMLU-PRO (5-shot) type: TIGER-Lab/MMLU-Pro config: main split: test args: num_few_shot: 5 metrics: - type: acc value: 11.46 name: accuracy source: url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=Pretergeek/OpenChat-3.5-0106_32K-PoSE name: Open LLM Leaderboard ---
# OpenChat-3.5-0106_32K-PoSE ## Description This model is [Openchat-3.5-0106](https://huggingface.co/openchat/openchat-3.5-0106) with the context length extended from 8192 tokens to 32768 tokens using [PoSE](https://huggingface.co/papers/2309.10400). The model was fine-tuned using [Rank-Stabilized LoRA](https://huggingface.co/blog/damjan-k/rslora) and the [LongAlpaca-12K](Yukang/LongAlpaca-12k) dataset. I hope to continue extending the context in future versions and then apply the same methods to my [upscaled versions of OpenChat-3.5](https://huggingface.co/collections/Pretergeek/openchat-35-0106-with-additional-layers-66a8d3262c7c3ebdd7783a29) that were created using Block Expansion instead of Depth UP Scaling. After fine-tuning, the model was tested using passkey retrieval and achieved a score of 100%. Below you can also find the results of the Open LLM Leaderboard evaluations and I am a bit disappointed with those. The model ended up with a significant reduction in performance compared to the original model in all but one test (MUSR). I expected it to do better than the original model on MUSR since that test benefits from long context understanding but I didn't expect such a negative impact on the other tasks. Anyway, I will be addressing this on a future version. I used the LongAlpaca-12K dataset because it is small and I have limited computational resources but I might have to try a larger dataset for the next attempt. If you would like to help me, there are links on the top of the model card for my Patreon and Ko-Fi. # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard) Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_Pretergeek__OpenChat-3.5-0106_32K-PoSE) | Metric |Value| |-------------------|----:| |Avg. |12.70| |IFEval (0-Shot) |39.69| |BBH (3-Shot) | 8.83| |MATH Lvl 5 (4-Shot)| 1.44| |GPQA (0-shot) | 3.47| |MuSR (0-shot) |11.33| |MMLU-PRO (5-shot) |11.46| # Citation ``` @misc{zhu2024poseefficientcontextwindow, title={PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training}, author={Dawei Zhu and Nan Yang and Liang Wang and Yifan Song and Wenhao Wu and Furu Wei and Sujian Li}, year={2024}, eprint={2309.10400}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2309.10400}, } ```