ISTA-DASLab
/

Llama-2-7b-AQLM-2Bit-2x8-hf

Text Generation

Inference Endpoints

text-generation-inference

8-bit precision

Model card Files Files and versions Community

SpiridonSunRotator commited on Feb 20

Commit

7b37574

•

1 Parent(s): 7e96183

Upload finetuned Llama-2-7b models

Files changed (1) hide show

README.md +5 -2

README.md CHANGED Viewed

@@ -6,12 +6,15 @@ Selected evaluation results for this and other models:
 | Model      | AQLM scheme | WikiText 2 PPL | Model size, Gb | Hub link                                                                 |
 |------------|-------------|----------------|----------------|--------------------------------------------------------------------------|
-| Llama-2-7b | 1x16        | 6.31           | 2.4            | [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-1x16-hf) |
-| Llama-2-7b (THIS) | 2x8         | 7.98           | 2.2            | [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-2x8-hf)  |
 | Llama-2-7b | 8x8         | 7.83           | 2.2            | [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-8x8-hf)  |
 | Llama-2-13b| 1x16        | 5.41           | 4.1            | [Link](https://huggingface.co/BlackSamorez/Llama-2-13b-AQLM-2Bit-1x16-hf)|
 | Llama-2-70b| 1x16        | 3.96           | 18.8           | [Link](https://huggingface.co/BlackSamorez/Llama-2-70b-AQLM-2Bit-1x16-hf)|
 | Llama-2-70b| 2x8         | 4.83           | 18.2           | [Link](https://huggingface.co/BlackSamorez/Llama-2-70b-AQLM-2Bit-2x8-hf) |
 | Mixtral-8x7b| 1x16       | 4.37           | 12.6            | [Link](https://huggingface.co/BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf)|
 To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the [official GitHub repo](https://github.com/Vahe1994/AQLM).

 | Model      | AQLM scheme | WikiText 2 PPL | Model size, Gb | Hub link                                                                 |
 |------------|-------------|----------------|----------------|--------------------------------------------------------------------------|
+| Llama-2-7b | 1x16        | 5.92            | 2.4            | [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-1x16-hf) |
+| Llama-2-7b (THIS) | 2x8         | 6.69           | 2.2            | [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-2x8-hf)  |
 | Llama-2-7b | 8x8         | 7.83           | 2.2            | [Link](https://huggingface.co/BlackSamorez/Llama-2-7b-AQLM-2Bit-8x8-hf)  |
 | Llama-2-13b| 1x16        | 5.41           | 4.1            | [Link](https://huggingface.co/BlackSamorez/Llama-2-13b-AQLM-2Bit-1x16-hf)|
 | Llama-2-70b| 1x16        | 3.96           | 18.8           | [Link](https://huggingface.co/BlackSamorez/Llama-2-70b-AQLM-2Bit-1x16-hf)|
 | Llama-2-70b| 2x8         | 4.83           | 18.2           | [Link](https://huggingface.co/BlackSamorez/Llama-2-70b-AQLM-2Bit-2x8-hf) |
 | Mixtral-8x7b| 1x16       | 4.37           | 12.6            | [Link](https://huggingface.co/BlackSamorez/Mixtral-8x7b-AQLM-2Bit-1x16-hf)|
+**UPD** (20.02.2024).
+We applied global finetuning on top of quantized model and improved results compared to first revision.
 To learn more about the inference, as well as the information on how to quantize models yourself, please refer to the [official GitHub repo](https://github.com/Vahe1994/AQLM).