Text Generation
Transformers
PyTorch
longllama
code
text-generation-inference
custom_code
Eval Results
syzymon commited on
Commit
0f3a950
1 Parent(s): 3dea16c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -106,7 +106,7 @@ This repository contains the research preview of **LongLLaMA, a large language m
106
 
107
  LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
108
 
109
- LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4% after just continued pre-training, no in-distribution fine-tuning.**.
110
 
111
  <p align="center" width="100%">
112
  <img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
@@ -129,8 +129,8 @@ with three layers used for context extension. **Crucially, LongLLaMA is able to
129
  |----------------|----------|----------|-----------|
130
  | Source model | [OpenLLaMA-3B](https://huggingface.co/openlm-research/open_llama_3b_easylm) | [OpenLLaMA-3Bv2](https://huggingface.co/openlm-research/open_llama_3b_v2_easylm) | [CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) |
131
  | Source model tokens | 1T | 1 T | 2T + 0.5 T |
132
- | Fine-tuning context | 8K | 32K | 32K |
133
- | Fine-tuning tokens | 10B | 5B | 35B |
134
  | Memory layers | 6, 12, 18 | 6, 12, 18 | 8, 16, 24 |
135
 
136
  </div>
 
106
 
107
  LongLLaMA-Code is built upon the foundation of [Code Llama](https://huggingface.co/codellama/CodeLlama-7b-hf).
108
 
109
+ LongLLaMA-Code has **improved reasoning capabilities** compared to CodeLlama, in particular we improve **GSM8K math reasoning from 13% to 17.4% after just continued pre-training, no in-distribution fine-tuning**.
110
 
111
  <p align="center" width="100%">
112
  <img src="https://raw.githubusercontent.com/CStanKonrad/long_llama/main/assets/results.png" alt="LongLLaMA" style="width: 70%; min-width: 300px; display: block; margin: auto;">
 
129
  |----------------|----------|----------|-----------|
130
  | Source model | [OpenLLaMA-3B](https://huggingface.co/openlm-research/open_llama_3b_easylm) | [OpenLLaMA-3Bv2](https://huggingface.co/openlm-research/open_llama_3b_v2_easylm) | [CodeLLaMA-7b-hf](https://huggingface.co/codellama/CodeLlama-7b-hf) |
131
  | Source model tokens | 1T | 1 T | 2T + 0.5 T |
132
+ | Fine-tuning context | 8K | **32K** | **32K** |
133
+ | Fine-tuning tokens | 10B | 5B | **35B** |
134
  | Memory layers | 6, 12, 18 | 6, 12, 18 | 8, 16, 24 |
135
 
136
  </div>