BUT-FIT
/

csmpt7b

 ---
 license: apache-2.0
 ---
+### Eval
+Dev eval at CS-HellaSwag
+| Model | Model Accuracy |
+|---------------|----------------|
+| mistral7b       | 0.4992         |
+| csmpt-130k    | __0.5004__         |
+| csmpt-100k    | 0.4959         |
+| csmpt-75k     | 0.4895         |
+| csmpt-50k steps | 0.4755       |
+| csmpt-26.5k steps | 0.4524      |
+However, we ran validation on Hellaswag, and after 100k, the improvements were very noisy if any. The improvement over mistral7b is not significant.
+### How to setup environment
+```bash
+pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0
+# be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
+pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
+1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
+### How to use in transformers
+```python
+import torch
+import transformers
+from transformers import pipeline
+name = 'BUT-FIT/csmpt7b'
+config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
+config.attn_config['attn_impl'] = 'flash'
+config.init_device = 'cuda:0'  # For fast initialization directly on GPU!
+model = transformers.AutoModelForCausalLM.from_pretrained(
+    name,
+    config=config,
+    torch_dtype=torch.bfloat16,  # Load model weights in bfloat16
+    trust_remote_code=True
+)
+tokenizer = transformers.AutoTokenizer.from_pretrained(name, trust_remote_code=True)
+pipe = pipeline('text-generation', model=model, tokenizer=tokenizer, device='cuda:0')
+with torch.autocast('cuda', dtype=torch.bfloat16):
+    print(
+        pipe('Nejznámějším českým spisovatelem ',
+             max_new_tokens=100,
+             top_p=0.95,
+             repetition_penalty=1.0,
+             do_sample=True,
+             use_cache=True))
+```
+### Our Release Plan
+| Stage | Description | Date |
+|---------------|----------------|----------------|
+| 1       | 'Best' model + training data    | 11.03.2024
+| 2       |  All checkpoints + training code|
+| 3       | __Benczechmark__ a collection of Czech datasets for few-shot LLM evaluation    |
+- Stage 1: 'Best' model + training data.
+- Stage 2: All checkpoints + training code
+- Stage 3: __Benczechmark__ a collection of Czech datasets. **Get in touch if you'd like to know more and contribute!**
+## Getting in Touch
+For further questions, email to `martin.fajcik@vut.cz`.
+## Disclaimer
+This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk.
+## Acknowledgement
+This work was supported by NAKI III program of  Ministry of Culture Czech Republic, project semANT ---
+"Sémantický průzkumník textového kulturního dědictví" grant no. `DH23P03OVV060` and
+by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:`90254`).