VinaLlama2-14B Beta
GGUF Here: VinaLlama2-14B-GGUF
Top Features:
- Context Length: 32,768 tokens.
- VERY GOOD at reasoning, mathematics and creative writing.
- Works with Langchain Agent out-of-the-box.
Known Issues
- Still a bit struggling with Vietnamese fact (Hoang Sa & Truong Sa, Historical questions).
- Hallucination when reasoning.
- Can't do Vi-En/En-Vi translation (yet)!
Quick use:
VRAM Requirement: ~20GB
pip install transformers accelerate
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = "cuda" # the device to load the model onto
model = AutoModelForCausalLM.from_pretrained(
"vilm/VinaLlama2-14B",
torch_dtype='auto',
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("vilm/VinaLlama2-14B")
prompt = "Một cộng một bằng mấy?"
messages = [
{"role": "system", "content": "Bạn là trợ lí AI hữu ích."},
{"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)
generated_ids = model.generate(
model_inputs.input_ids,
max_new_tokens=1024,
eos_token_id=tokenizer.eos_token_id,
temperature=0.25,
)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids)[0]
print(response)
- Downloads last month
- 458
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.