avans06
/

ALMA-13B-ct2-int8_float16

Text Generation

Inference Endpoints

Model card Files Files and versions Community

ALMA-13B-ct2-int8_float16 / README.md

avans06's picture

Update README.md

a021db4 7 months ago

|

raw history blame contribute delete

No virus

2.76 kB

	---
	license: mit
	tags:
	- ctranslate2
	- quantization
	- int8
	- float16
	- text-generation
	- ALMA
	- llama
	---

	# ALMA-13B model for CTranslate2

	The model is quantized version of the [haoranxu/ALMA-13B](https://huggingface.co/haoranxu/ALMA-13B) with int8_float16 quantization and can be used in [CTranslate2](https://github.com/OpenNMT/CTranslate2).

	ALMA (Advanced Language Model-based trAnslator) is an LLM-based translation model, which adopts a new translation model paradigm: it begins with fine-tuning on monolingual data and is further optimized using high-quality parallel data. This two-step fine-tuning process ensures strong translation performance.

	- Model creator: [Haoran Xu](https://huggingface.co/haoranxu)
	- Original model: [ALMA 13B](https://huggingface.co/haoranxu/ALMA-13B)


	## Conversion details

	The original model was converted on 2023-12 with the following command:

	```
	ct2-transformers-converter --model haoranxu/ALMA-13B --quantization int8_float16 --output_dir ALMA-13B-ct2-int8_float16 \
	--copy_files generation_config.json special_tokens_map.json tokenizer.model tokenizer_config.json
	```


	## Prompt template: ALMA

	```
	Translate this from English to Chinese:
	English: {prompt}
	Chinese:
	```


	## Example

	This example code is obtained from [CTranslate2_transformers](https://opennmt.net/CTranslate2/guides/transformers.html#mpt).
	More detailed information about the `generate_batch` methon can be found at [CTranslate2_Generator.generate_batch](https://opennmt.net/CTranslate2/python/ctranslate2.Generator.html#ctranslate2.Generator.generate_batch).

	```python
	import ctranslate2
	import transformers

	generator = ctranslate2.Generator("avans06/ALMA-13B-ct2-int8_float16")
	tokenizer = transformers.AutoTokenizer.from_pretrained("haoranxu/ALMA-13B")

	text = "Who is Alan Turing?"
	prompt = f"Translate this from English to Chinese:\nEnglish: {text}\nChinese:"
	tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(prompt))

	results = generator.generate_batch([tokens], max_length=256, sampling_temperature=0.7, sampling_topp=0.9, repetition_penalty=1.1, include_prompt_in_result=False)

	output = tokenizer.decode(results[0].sequences_ids[0])
	```


	## The following explanations are excerpted from the [FAQ section of the author's GitHub README](https://github.com/fe1ixxu/ALMA#what-language-directions-do-alma-support).
	- What language directions do ALMA support?
	Currently, ALMA supports 10 directions: English↔German, Englishs↔Czech, Englishs↔Icelandic, Englishs↔Chinese, Englishs↔Russian. However, it may surprise us in other directions :)


	## More information

	For more information about the original model, see its [GitHub repository](https://github.com/fe1ixxu/ALMA)