Text Generation
Transformers
PyTorch
longllama
code
text-generation-inference
custom_code
Eval Results
syzymon commited on
Commit
c682b6d
1 Parent(s): a9f5b43

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -10
README.md CHANGED
@@ -66,15 +66,18 @@ model-index:
66
 
67
  </div>
68
 
69
- <p align="center" width="100%">
70
- <img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
71
- </p>
72
 
73
  ## TLDR
74
  This repository contains the research preview of **LongLLaMA, a large language model capable of handling long contexts of 256k tokens or even more**.
75
 
76
- LongLLaMA is built upon the foundation of [OpenLLaMA](https://github.com/openlm-research/open_llama) and fine-tuned using the [Focused Transformer (FoT)](https://arxiv.org/abs/2307.03170) method.
77
- LongLLaMA Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
 
 
 
 
 
78
 
79
  ## Overview
80
 
@@ -84,7 +87,7 @@ LongLLaMA Code is built upon the foundation of [Code Llama](https://huggingface.
84
 
85
  **LongLLaMA** is an [OpenLLaMA](https://github.com/openlm-research/open_llama) model finetuned with the FoT method,
86
  with three layers used for context extension. **Crucially, LongLLaMA is able to extrapolate much beyond the context length seen in training: 8k. E.g., in the passkey retrieval task, it can handle inputs of length 256k**.
87
- **LongLLaMA Code** is a [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf) model finetuned with the FoT method.
88
 
89
 
90
  <div align="center">
@@ -159,9 +162,9 @@ LongLLaMA has several other parameters:
159
  import torch
160
  from transformers import LlamaTokenizer, AutoModelForCausalLM
161
 
162
- tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_3b_v1_1")
163
  model = AutoModelForCausalLM.from_pretrained(
164
- "syzymon/long_llama_3b_v1_1", torch_dtype=torch.float32,
165
  mem_layers=[],
166
  mem_dtype='bfloat16',
167
  trust_remote_code=True,
@@ -177,8 +180,8 @@ model = AutoModelForCausalLM.from_pretrained(
177
  from transformers import LlamaTokenizer, LlamaForCausalLM
178
  import torch
179
 
180
- tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_3b_v1_1")
181
- model = LlamaForCausalLM.from_pretrained("syzymon/long_llama_3b_v1_1", torch_dtype=torch.float32)
182
  ```
183
 
184
 
 
66
 
67
  </div>
68
 
69
+
 
 
70
 
71
  ## TLDR
72
  This repository contains the research preview of **LongLLaMA, a large language model capable of handling long contexts of 256k tokens or even more**.
73
 
74
+ LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
75
+
76
+ LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4%**.
77
+
78
+ <p align="center" width="100%">
79
+ <img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
80
+ </p>
81
 
82
  ## Overview
83
 
 
87
 
88
  **LongLLaMA** is an [OpenLLaMA](https://github.com/openlm-research/open_llama) model finetuned with the FoT method,
89
  with three layers used for context extension. **Crucially, LongLLaMA is able to extrapolate much beyond the context length seen in training: 8k. E.g., in the passkey retrieval task, it can handle inputs of length 256k**.
90
+ **LongLLaMA-Code** is a [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf) model finetuned with the FoT method.
91
 
92
 
93
  <div align="center">
 
162
  import torch
163
  from transformers import LlamaTokenizer, AutoModelForCausalLM
164
 
165
+ tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_code_7b")
166
  model = AutoModelForCausalLM.from_pretrained(
167
+ "syzymon/long_llama_code_7b", torch_dtype=torch.float32,
168
  mem_layers=[],
169
  mem_dtype='bfloat16',
170
  trust_remote_code=True,
 
180
  from transformers import LlamaTokenizer, LlamaForCausalLM
181
  import torch
182
 
183
+ tokenizer = LlamaTokenizer.from_pretrained("syzymon/long_llama_code_7b")
184
+ model = LlamaForCausalLM.from_pretrained("syzymon/long_llama_code_7b", torch_dtype=torch.float32)
185
  ```
186
 
187