syzymon
/

long_llama_code_7b

@@ -66,15 +66,18 @@ model-index:
 </div>
-<p align="center" width="100%">
-<img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
-</p>
 ## TLDR
 This repository contains the research preview of **LongLLaMA, a large language model capable of handling long contexts of 256k tokens or even more**.
-LongLLaMA is built upon the foundation of [OpenLLaMA](https://github.com/openlm-research/open_llama) and fine-tuned using the [Focused Transformer (FoT)](https://arxiv.org/abs/2307.03170) method.
-LongLLaMA Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
 ## Overview
@@ -84,7 +87,7 @@ LongLLaMA Code is built upon the foundation of [Code Llama](https://huggingface.
 **LongLLaMA** is an [OpenLLaMA](https://github.com/openlm-research/open_llama) model finetuned with the FoT method,
 with three layers used for context extension. **Crucially, LongLLaMA is able to extrapolate much beyond the context length seen in training: 8k. E.g., in the passkey retrieval task, it can handle inputs of length 256k**.
-**LongLLaMA Code** is a [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf) model finetuned with the FoT method.
 <div align="center">
@@ -159,9 +162,9 @@ LongLLaMA has several other parameters:
 import torch
 from transformers import LlamaTokenizer, AutoModelForCausalLM
-tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_3b_v1_1")
 model = AutoModelForCausalLM.from_pretrained(
-    "syzymon/long_llama_3b_v1_1", torch_dtype=torch.float32,
     mem_layers=[],
     mem_dtype='bfloat16',
     trust_remote_code=True,
@@ -177,8 +180,8 @@ model = AutoModelForCausalLM.from_pretrained(
 from transformers import LlamaTokenizer, LlamaForCausalLM
 import torch
-tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_3b_v1_1")
-model = LlamaForCausalLM.from_pretrained("syzymon/long_llama_3b_v1_1", torch_dtype=torch.float32)
 ```

 </div>
 ## TLDR
 This repository contains the research preview of **LongLLaMA, a large language model capable of handling long contexts of 256k tokens or even more**.
+LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
+LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4%**.
+<p align="center" width="100%">
+<img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
+</p>
 ## Overview
 **LongLLaMA** is an [OpenLLaMA](https://github.com/openlm-research/open_llama) model finetuned with the FoT method,
 with three layers used for context extension. **Crucially, LongLLaMA is able to extrapolate much beyond the context length seen in training: 8k. E.g., in the passkey retrieval task, it can handle inputs of length 256k**.
+**LongLLaMA-Code** is a [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf) model finetuned with the FoT method.
 <div align="center">
 import torch
 from transformers import LlamaTokenizer, AutoModelForCausalLM
+tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_code_7b")
 model = AutoModelForCausalLM.from_pretrained(
+    "syzymon/long_llama_code_7b", torch_dtype=torch.float32,
     mem_layers=[],
     mem_dtype='bfloat16',
     trust_remote_code=True,
 from transformers import LlamaTokenizer, LlamaForCausalLM
 import torch
+tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_code_7b")
+model = LlamaForCausalLM.from_pretrained("syzymon/long_llama_code_7b", torch_dtype=torch.float32)
 ```