|
--- |
|
license: apache-2.0 |
|
library_name: transformers |
|
base_model: BSC-LT/salamandra-7b-instruct |
|
pipeline_tag: text-generation |
|
language: |
|
- bg |
|
- ca |
|
- code |
|
- cs |
|
- cy |
|
- da |
|
- de |
|
- el |
|
- en |
|
- es |
|
- et |
|
- eu |
|
- fi |
|
- fr |
|
- ga |
|
- gl |
|
- hr |
|
- hu |
|
- it |
|
- lt |
|
- lv |
|
- mt |
|
- nl |
|
- nn |
|
- \no |
|
- oc |
|
- pl |
|
- pt |
|
- ro |
|
- ru |
|
- sh |
|
- sk |
|
- sl |
|
- sr |
|
- sv |
|
- uk |
|
--- |
|
|
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/633b489acbdbadd99c0b75ef/0AxppoCn6DIgZj6jp7feW.png) |
|
|
|
# Salamandra-7b-instruct-gptq Model Card |
|
|
|
This model is the gptq-quantized version of [Salamandra-7b-instruct](https://huggingface.co/BSC-LT/salamandra-7b-instruct) for speculative decoding. |
|
|
|
The model weights are quantized from FP16 to W4A16 (4-bit weights and FP16 activations) using the [GPTQ](https://arxiv.org/abs/2210.17323) algorithm. |
|
Inferencing with this model can be done using [VLLM](https://docs.vllm.ai/en/stable/models/engine_args.html). |
|
|
|
Salamandra is a highly multilingual model pre-trained from scratch that comes in three different |
|
sizes — 2B, 7B and 40B parameters — with their respective base and instruction-tuned variants, |
|
promoted and financed by the Government of Catalonia through the [Aina Project](https://projecteaina.cat/) |
|
and the _Ministerio para la Transformación Digital y de la Función Pública_ - Funded by EU – NextGenerationEU |
|
within the framework of [ILENIA Project](https://proyectoilenia.es/) with reference 2022/TL22/00215337. |
|
|
|
This model card corresponds to the gptq-quantized version of Salamandra-7b-instruct for speculative decoding. |
|
|
|
The entire Salamandra family is released under a permissive [Apache 2.0 license]((https://www.apache.org/licenses/LICENSE-2.0)). |
|
|
|
|
|
## How to Use |
|
|
|
The following example code works under ``Python 3.9.16``, ``vllm==0.6.3.post1``, ``torch==2.4.0`` and ``torchvision==0.19.0``, though it should run on |
|
any current version of the libraries. This is an example of a conversational chatbot using the model: |
|
|
|
``` |
|
from vllm import LLM, SamplingParams |
|
|
|
model_name = "BSC-LT/salamandra-7b-instruct-gptq" |
|
llm = LLM(model=model_name) |
|
|
|
messages = [] |
|
|
|
while True: |
|
user_input = input("user >> ") |
|
if user_input.lower() == "exit": |
|
print("Chat ended.") |
|
break |
|
|
|
messages.append({'role': 'user', 'content': user_input}) |
|
|
|
outputs = llm.chat(messages, |
|
sampling_params=SamplingParams( |
|
temperature=0.5, |
|
stop_token_ids=[5], |
|
max_tokens=200) |
|
)[0].outputs |
|
|
|
model_output = outputs[0].text |
|
print(f'assistant >> {model_output}') |
|
messages.append({'role': 'assistant', 'content': model_output}) |
|
``` |
|
|
|
### Author |
|
International Business Machines (IBM). |
|
|
|
### Copyright |
|
International Business Machines (IBM). |
|
|
|
### Contact |
|
For further information, please send an email to <langtech@bsc.es>. |
|
|
|
### Acknowledgements |
|
We appreciate the collaboration with IBM in this work. |
|
Specifically, the IBM team created gptq-quantized version of the Salamandra-7b-instruct model for speculative decoding released here. |
|
|
|
### Disclaimer |
|
Be aware that the model may contain biases or other unintended distortions. |
|
When third parties deploy systems or provide services based on this model, or use the model themselves, |
|
they bear the responsibility for mitigating any associated risks and ensuring compliance with applicable |
|
regulations, including those governing the use of Artificial Intelligence. |
|
|
|
Barcelona Supercomputing Center and International Business Machines shall |
|
not be held liable for any outcomes resulting from third-party use. |
|
|
|
### License |
|
[Apache License, Version 2.0](https://www.apache.org/licenses/LICENSE-2.0) |