justheuristic commited on
Commit
a0f548a
1 Parent(s): 8e8cdbc

Create model card

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - llama
5
+ - facebook
6
+ - meta
7
+ - llama-3
8
+ - conversational
9
+ - text-generation-inference
10
+ ---
11
+
12
+ An official quantization of [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) using [PV-Tuning](https://arxiv.org/abs/2405.14852) on top of [AQLM](https://arxiv.org/abs/2401.06118) .
13
+
14
+ For this quantization, we used 1 codebook of 16 bits for groups of 8 weights.
15
+
16
+
17
+ | Model | AQLM scheme | WikiText 2 PPL | Model size, Gb | Hub link |
18
+ |------------|-------------|----------------|----------------|--------------------------------------------------------------------------|
19
+ | meta-llama/Meta-Llama-3-8B (this) | 1x16g8 | 6.99 | 4.1 | [Link](https://huggingface.co/ISTA-DASLab/Meta-Llama-3-8B-AQLM-PV-2Bit-1x16) |
20
+ | meta-llama/Meta-Llama-3-70B | 1x16g8 | 4.57 | 21.9 | [Link](https://huggingface.co/ISTA-DASLab/Llama-2-70b-AQLM-PV-2Bit-1x16-hf)|
21
+
22
+ The 1x16g16 (1-bit) models are on the way, as soon as we update the inference lib with their respective kernels.
23
+
24
+ To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the [official GitHub repo](https://github.com/Vahe1994/AQLM).
25
+ The original code for PV-Tuning can be found in the [AQLM@pv-tuning](https://github.com/Vahe1994/AQLM/tree/pv-tuning) branch.