Text Generation
Transformers
Safetensors
Czech
mpt
custom_code
text-generation-inference
Inference Endpoints
mfajcik commited on
Commit
de77494
1 Parent(s): 87bc456

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -1,7 +1,10 @@
1
  ---
2
  license: apache-2.0
3
  ---
4
- ### Eval
 
 
 
5
  Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark)
6
  | Model | Model Accuracy |
7
  |---------------|----------------|
@@ -17,15 +20,17 @@ However, we ran validation over the course of training on CS-Hellaswag, and afte
17
  The improvement over mistral7b is not significant.
18
 
19
 
20
- ### How to setup environment
 
21
  ```bash
22
  pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0
23
 
24
  # be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
25
  pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
26
  1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
 
27
 
28
- ### How to use in transformers
29
  ```python
30
  import torch
31
  import transformers
@@ -34,7 +39,6 @@ from transformers import pipeline
34
  name = 'BUT-FIT/csmpt7b'
35
 
36
  config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
37
- config.attn_config['attn_impl'] = 'flash'
38
  config.init_device = 'cuda:0' # For fast initialization directly on GPU!
39
  model = transformers.AutoModelForCausalLM.from_pretrained(
40
  name,
@@ -56,30 +60,26 @@ with torch.autocast('cuda', dtype=torch.bfloat16):
56
  do_sample=True,
57
  use_cache=True))
58
 
59
- ```
 
 
60
 
61
 
62
- ### Our Release Plan
63
  | Stage | Description | Date |
64
  |---------------|----------------|----------------|
65
  | 1 | 'Best' model + training data | 11.03.2024
66
  | 2 | All checkpoints + training code|
67
  | 3 | __Benczechmark__ a collection of Czech datasets for few-shot LLM evaluation |
68
 
69
-
70
-
71
- - Stage 1: 'Best' model + training data.
72
- - Stage 2: All checkpoints + training code
73
- - Stage 3: __Benczechmark__ a collection of Czech datasets. **Get in touch if you'd like to know more and contribute!**
74
-
75
  ## Getting in Touch
76
  For further questions, email to `martin.fajcik@vut.cz`.
77
 
78
- ## Disclaimer
79
  This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk.
80
 
81
 
82
- ## Acknowledgement
83
  This work was supported by NAKI III program of Ministry of Culture Czech Republic, project semANT ---
84
  "Sémantický průzkumník textového kulturního dědictví" grant no. `DH23P03OVV060` and
85
  by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:`90254`).
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+ # Intruduction
5
+
6
+
7
+ # Eval
8
  Dev eval at CS-HellaSwag (automatically translated HellaSwag benchmark)
9
  | Model | Model Accuracy |
10
  |---------------|----------------|
 
20
  The improvement over mistral7b is not significant.
21
 
22
 
23
+ # Usage
24
+ ## How to Setup Environment
25
  ```bash
26
  pip install transformers==4.37.2 torch==2.1.2 einops==0.7.0
27
 
28
  # be sure to install right flash-attn, we use torch compiled with CUDA 12.1, no ABI, python 3.9, Linux x86_64 architecture
29
  pip install https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.
30
  1cxx11abiFALSE-cp39-cp39-linux_x86_64.whl
31
+ ```
32
 
33
+ ## Running the Code
34
  ```python
35
  import torch
36
  import transformers
 
39
  name = 'BUT-FIT/csmpt7b'
40
 
41
  config = transformers.AutoConfig.from_pretrained(name, trust_remote_code=True)
 
42
  config.init_device = 'cuda:0' # For fast initialization directly on GPU!
43
  model = transformers.AutoModelForCausalLM.from_pretrained(
44
  name,
 
60
  do_sample=True,
61
  use_cache=True))
62
 
63
+ ```
64
+ # Training Data
65
+ We release most of our training data here \[TBD MDocekal.\].
66
 
67
 
68
+ # Our Release Plan
69
  | Stage | Description | Date |
70
  |---------------|----------------|----------------|
71
  | 1 | 'Best' model + training data | 11.03.2024
72
  | 2 | All checkpoints + training code|
73
  | 3 | __Benczechmark__ a collection of Czech datasets for few-shot LLM evaluation |
74
 
 
 
 
 
 
 
75
  ## Getting in Touch
76
  For further questions, email to `martin.fajcik@vut.cz`.
77
 
78
+ # Disclaimer
79
  This is a probabilistic model, and authors are not responsible for the model outputs. Use at your own risk.
80
 
81
 
82
+ # Acknowledgement
83
  This work was supported by NAKI III program of Ministry of Culture Czech Republic, project semANT ---
84
  "Sémantický průzkumník textového kulturního dědictví" grant no. `DH23P03OVV060` and
85
  by the Ministry of Education, Youth and Sports of the Czech Republic through the e-INFRA CZ (ID:`90254`).