Text Generation
Transformers
Safetensors
Czech
mpt
custom_code
text-generation-inference
Inference Endpoints
csmpt7b / README.md
mfajcik's picture
Update README.md
7c9025d verified
|
raw
history blame
2.87 kB
metadata
license: apache-2.0

Eval

Dev eval at CS-HellaSwag

Model Model Accuracy
mistral7b 0.4992
csmpt-130k 0.5004
csmpt-100k 0.4959
csmpt-75k 0.4895
csmpt-50k steps 0.4755
csmpt-26.5k steps 0.4524

However, we ran validation on Hellaswag, and after 100k, the improvements were very noisy if any. The improvement over mistral7b is not significant.

How to setup environment

pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0

# be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl

### How to use in transformers
```python
import torch
import transformers
from transformers import pipeline

name = 'BUT-FIT/csmpt7b'

config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
config.attn_config['attn_impl'] = 'flash'
config.init_device = 'cuda:0'  # For fast initialization directly on GPU!
model = transformers.AutoModelForCausalLM.from_pretrained(
    name,
    config=config,
    torch_dtype=torch.bfloat16,  # Load model weights in bfloat16
    trust_remote_code=True
)

tokenizer = transformers.AutoTokenizer.from_pretrained(name, trust_remote_code=True)

pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')

with torch.autocast('cuda', dtype=torch.bfloat16):
    print(
        pipe('Nejznámějším českým spisovatelem ',
             max_new_tokens=100,
             top_p=0.95,
             repetition_penalty=1.0,
             do_sample=True,
             use_cache=True))

Our Release Plan

Stage Description Date
1 'Best' model + training data 11.03.2024
2 All checkpoints + training code
3 Benczechmark a collection of Czech datasets for few-shot LLM evaluation
  • Stage 1: 'Best' model + training data.
  • Stage 2: All checkpoints + training code
  • Stage 3: Benczechmark a collection of Czech datasets. Get in touch if you'd like to know more and contribute!

Getting in Touch

For further questions, email to martin.fajcik@vut.cz.

Disclaimer

This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk.

Acknowledgement

This work was supported by NAKI III program of Ministry of Culture Czech Republic, project semANT --- "Sémantický průzkumník textového kulturního dědictví" grant no. DH23P03OVV060 and by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:90254).