metadata
license: apache-2.0
Eval
Dev eval at CS-HellaSwag
Model | Model Accuracy |
---|---|
mistral7b | 0.4992 |
csmpt-130k | 0.5004 |
csmpt-100k | 0.4959 |
csmpt-75k | 0.4895 |
csmpt-50k steps | 0.4755 |
csmpt-26.5k steps | 0.4524 |
However, we ran validation on Hellaswag, and after 100k, the improvements were very noisy if any. The improvement over mistral7b is not significant.
How to setup environment
pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0
# be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
### How to use in transformers
```python
import torch
import transformers
from transformers import pipeline
name = 'BUT-FIT/csmpt7b'
config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'flash'
config.init_device = 'cuda:0' # For fast initialization directly on GPU!
model = transformers.AutoModelForCausalLM.from_pretrained(
name,
config=config,
torch_dtype=torch.bfloat16, # Load model weights in bfloat16
trust_remote_code=True
)
tokenizer = transformers.AutoTokenizer.from_pretrained(name, trust_remote_code=True)
pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
with torch.autocast('cuda', dtype=torch.bfloat16):
print(
pipe('Nejznámějším českým spisovatelem ',
max_new_tokens=100,
top_p=0.95,
repetition_penalty=1.0,
do_sample=True,
use_cache=True))
Our Release Plan
Stage | Description | Date |
---|---|---|
1 | 'Best' model + training data | 11.03.2024 |
2 | All checkpoints + training code | |
3 | Benczechmark a collection of Czech datasets for few-shot LLM evaluation |
- Stage 1: 'Best' model + training data.
- Stage 2: All checkpoints + training code
- Stage 3: Benczechmark a collection of Czech datasets. Get in touch if you'd like to know more and contribute!
Getting in Touch
For further questions, email to martin.fajcik@vut.cz
.
Disclaimer
This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk.
Acknowledgement
This work was supported by NAKI III program of Ministry of Culture Czech Republic, project semANT ---
"Sémantický průzkumník textového kulturního dědictví" grant no. DH23P03OVV060
and
by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90254
).