genz-7b / README.md
dittops's picture
updates on Readme and tokenizer
a274762
---
license: apache-2.0
---
# GenZ
The most capable commercially usable Instruct Finetuned LLM yet with 8K input token length, latest information & better coding.
## Inference
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("budecosystem/genz-7b", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("budecosystem/genz-7b", torch_dtype=torch.bfloat16)
inputs = tokenizer("The world is", return_tensors="pt")
sample = model.generate(**inputs, max_length=128)
print(tokenizer.decode(sample[0]))
```
Use following prompt template
```
A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: Hi, how are you? ASSISTANT:
```
## Finetuning
```bash
python finetune.py
--model_name Salesforce/xgen-7b-8k-base
--data_path dataset.json
--output_dir output
--trust_remote_code
--prompt_column instruction
--response_column output
--pad_token_id 50256
```
Check the GitHub for the code -> [GenZ](https://github.com/BudEcosystem/GenZ)