license: apache-2.0
language:
- ru
- en
- de
- es
- it
- ja
- vi
- zh
- fr
- pt
- id
- ko
pipeline_tag: text-generation
🌍 Vulture-40B
Vulture-40B is a further fine-tuned causal Decoder-only LLM built by Virtual Interactive (VILM), on top of the famous Falcon-40B by TII. We collected a new dataset from news articles and Wikipedia's pages of 12 languages (Total: 80GB) and continue the pretraining process of Falcon-40B. Finally, we construct a multilingual instructional dataset following Alpaca's techniques.
Technical Report coming soon 🤗
Prompt Format
The reccomended model usage is:
A chat between a curious user and an artificial intelligence assistant.
USER:{user's question}<|endoftext|>ASSISTANT:
Model Details
Model Description
- Developed by: https://www.tii.ae
- Finetuned by: Virtual Interactive
- Language(s) (NLP): English, German, Spanish, French, Portugese, Russian, Italian, Vietnamese, Indonesian, Chinese, Japanese and Korean
- Training Time: 1,800 A100 Hours
Acknowledgement
- Thanks to TII for the amazing Falcon as the foundation model.
- Big thanks to Google for their generous Cloud credits.
Out-of-Scope Use
Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
Bias, Risks, and Limitations
Vulture-40B is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
Recommendations
We recommend users of Vulture-40B to consider finetuning it for the specific set of tasks of interest, and for guardrails and appropriate precautions to be taken for any production use.
How to Get Started with the Model
To run inference with the model in full bfloat16
precision you need approximately 4xA100 80GB or equivalent.
from transformers import AutoTokenizer, AutoModelForCausalLM
import transformers
import torch
model = "vilm/vulture-40B"
tokenizer = AutoTokenizer.from_pretrained(model)
m = AutoModelForCausalLM.from_pretrained(model, torch_dtype=torch.bfloat16, device_map="auto" )
prompt = "A chat between a curious user and an artificial intelligence assistant.\n\nUSER:Thành phố Hồ Chí Minh nằm ở đâu?<|endoftext|>ASSISTANT:"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = m.generate(input_ids=inputs["input_ids"],
attention_mask=inputs["attention_mask"],
do_sample=True,
temperature=0.6,
top_p=0.9,
max_new_tokens=50,)
output = output[0].to("cpu")
print(tokenizer.decode(output))