Quantization made by Richard Erkhov.

PersianMind-v1.0 - GGUF

Model creator: https://huggingface.co/universitytehran/
Original model: https://huggingface.co/universitytehran/PersianMind-v1.0/

Name	Quant method	Size
PersianMind-v1.0.Q2_K.gguf	Q2_K	2.4GB
PersianMind-v1.0.IQ3_XS.gguf	IQ3_XS	2.65GB
PersianMind-v1.0.IQ3_S.gguf	IQ3_S	2.79GB
PersianMind-v1.0.Q3_K_S.gguf	Q3_K_S	2.79GB
PersianMind-v1.0.IQ3_M.gguf	IQ3_M	2.95GB
PersianMind-v1.0.Q3_K.gguf	Q3_K	3.12GB
PersianMind-v1.0.Q3_K_M.gguf	Q3_K_M	3.12GB
PersianMind-v1.0.Q3_K_L.gguf	Q3_K_L	3.4GB
PersianMind-v1.0.IQ4_XS.gguf	IQ4_XS	3.45GB
PersianMind-v1.0.Q4_0.gguf	Q4_0	3.61GB
PersianMind-v1.0.IQ4_NL.gguf	IQ4_NL	3.63GB
PersianMind-v1.0.Q4_K_S.gguf	Q4_K_S	3.64GB
PersianMind-v1.0.Q4_K.gguf	Q4_K	3.85GB
PersianMind-v1.0.Q4_K_M.gguf	Q4_K_M	3.85GB
PersianMind-v1.0.Q4_1.gguf	Q4_1	4.0GB
PersianMind-v1.0.Q5_0.gguf	Q5_0	4.39GB
PersianMind-v1.0.Q5_K_S.gguf	Q5_K_S	4.39GB
PersianMind-v1.0.Q5_K.gguf	Q5_K	4.51GB
PersianMind-v1.0.Q5_K_M.gguf	Q5_K_M	4.51GB
PersianMind-v1.0.Q5_1.gguf	Q5_1	4.77GB
PersianMind-v1.0.Q6_K.gguf	Q6_K	5.21GB
PersianMind-v1.0.Q8_0.gguf	Q8_0	6.75GB

Original model description:

license: cc-by-nc-sa-4.0 language: - multilingual - fa - en library_name: transformers tags: - text-generation-inference inference: false metrics: - bleu - comet - accuracy - perplexity - spearmanr pipeline_tag: text-generation co2_eq_emissions: emissions: 232380

PersianMind logo

PersianMind

PersianMind is a cross-lingual Persian-English large language model. The model achieves state-of-the-art results on Persian subset of the Belebele benchmark and the ParsiNLU multiple-choice QA task. It also attains performance comparable to GPT-3.5-turbo in a Persian reading comprehension task.

Model Description

Developed by: Pedram Rostami, Ali Salemi, and Mohammad Javad Dousti
Model type: Language model
Languages: English and Persian
License: CC BY-NC-SA 4.0 (non-commercial use only.)

How to Get Started with the Model

Use the code below to get started with the model. Note that you need to install sentencepiece and accelerate libraries along with PyTorch and 🤗Transformers to run this code.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
model = AutoModelForCausalLM.from_pretrained(
    "universitytehran/PersianMind-v1.0",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
    device_map={"": device},
)
tokenizer = AutoTokenizer.from_pretrained(
    "universitytehran/PersianMind-v1.0",
)

TEMPLATE = "{context}\nYou: {prompt}\nPersianMind: "
CONTEXT = "This is a conversation with PersianMind. It is an artificial intelligence model designed by a team of " \
    "NLP experts at the University of Tehran to help you with various tasks such as answering questions, " \
    "providing recommendations, and helping with decision making. You can ask it anything you want and " \
    "it will do its best to give you accurate and relevant information."
PROMPT = "در مورد هوش مصنوعی توضیح بده."

model_input = TEMPLATE.format(context=CONTEXT, prompt=PROMPT)
input_tokens = tokenizer(model_input, return_tensors="pt")
input_tokens = input_tokens.to(device)
generate_ids = model.generate(**input_tokens, max_new_tokens=512, do_sample=False, repetition_penalty=1.1)
model_output = tokenizer.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]

print(model_output[len(model_input):])

How to Quantize the Model

Quantized models can be run on resource-constrained devices. To quantize the model, you should install the bitsandbytes library. In order to quantize the model in 8-bit (INT8), use the code below.

model = AutoModelForCausalLM.from_pretrained(
    "universitytehran/PersianMind-v1.0",
    device_map="auto",
    low_cpu_mem_usage=True,
    load_in_8bit=True
)

Alternatively, you can quantize the model in 4-bit (NormalFloat4) with the following code.

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
)
model = AutoModelForCausalLM.from_pretrained(
    "universitytehran/PersianMind-v1.0", 
    quantization_config=quantization_config, 
    device_map="auto"
)

Evaluating Quantized Models

Model	Belebele (Persian)	Fa→En Translation (Comet)	En→Fa Translation (Comet)	Model Size	Tokens/sec
PersianMind (`BF16`)	73.9	83.61	79.44	13.7G	25.35
PersianMind (`INT8`)	73.7	82.32	78.61	7.2G	11.36
PersianMind (`NF4`)	70.2	82.07	80.36	3.9G	24.36

We evaluated quantized models in various tasks against the original model. Specifically, we evaluated all models using the reading comprehension multiple-choice question-answering benchmark of Belebele (Persian subset) and reported the accuracy of each model. Additionally, we evaluated our models for Persian-to-English and English-to-Persian translation tasks. For this, we utilized the Persian-English subset of the Flores-200 dataset and reported our results using the Comet metric. Furthermore, we calculated the average number of generated tokens per second by each model during running the translation tasks. To understand resource efficiency, we measured the memory usage of each model by employing the get_memory_footprint() function.

License

PersianMind is subject to Meta's LLaMa2 Community License. It is further licensed under CC BY-NC-SA 4.0, which allows non-commercial use of the model. Commercial use of this model requires written agreement which must be obtained from the copyright holders who are listed as developers in this page. If you suspect any violations, please reach out to us.

Citation

If you find this model helpful, please ensure to cite the following paper.

BibTeX:

@misc{persianmind,
  title={{PersianMind: A Cross-Lingual Persian-English Large Language Model}},
  author={Rostami, Pedram and Salemi, Ali and Dousti, Mohammad Javad},
  year={2024}
  eprint={2401.06466},
  archivePrefix={arXiv},
  primaryClass={cs.CL}
}