--- license: apache-2.0 --- ### Eval Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark) | Model | Model Accuracy | |---------------|----------------| | mistral7b | 0.4992 | | csmpt-130k | __0.5004__ | | csmpt-100k | 0.4959 | | csmpt-75k | 0.4895 | | csmpt-50k steps | 0.4755 | | csmpt-26.5k steps | 0.4524 | However, we ran validation over the course of training on CS-Hellaswag, and after 100k, the improvements were very noisy if any. The improvement over mistral7b is not significant. ### How to setup environment ```bash pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0 # be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2. 1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl ### How to use in transformers ```python import torch import transformers from transformers import pipeline name = 'BUT-FIT/csmpt7b' config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True) config.attn_config['attn_impl'] = 'flash' config.init_device = 'cuda:0' # For fast initialization directly on GPU! model = transformers.AutoModelForCausalLM.from_pretrained( name, config=config, torch_dtype=torch.bfloat16, # Load model weights in bfloat16 trust_remote_code=True ) tokenizer = transformers.AutoTokenizer.from_pretrained(name, trust_remote_code=True) pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0') with torch.autocast('cuda', dtype=torch.bfloat16): print( pipe('Nejznámějším českým spisovatelem ', max_new_tokens=100, top_p=0.95, repetition_penalty=1.0, do_sample=True, use_cache=True)) ``` ### Our Release Plan | Stage | Description | Date | |---------------|----------------|----------------| | 1 | 'Best' model + training data | 11.03.2024 | 2 | All checkpoints + training code| | 3 | __Benczechmark__ a collection of Czech datasets for few-shot LLM evaluation | - Stage 1: 'Best' model + training data. - Stage 2: All checkpoints + training code - Stage 3: __Benczechmark__ a collection of Czech datasets. **Get in touch if you'd like to know more and contribute!** ## Getting in Touch For further questions, email to `martin.fajcik@vut.cz`. ## Disclaimer This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk. ## Acknowledgement This work was supported by NAKI III program of Ministry of Culture Czech Republic, project semANT --- "Sémantický průzkumník textového kulturního dědictví" grant no. `DH23P03OVV060` and by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:`90254`).