avemio-digital commited on
Commit
1a6d08b
·
verified ·
1 Parent(s): 015a112

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -21
README.md CHANGED
@@ -1,7 +1,7 @@
1
  ---
2
  license: apache-2.0
3
  datasets:
4
- - avemio/GRAG-CPT-HESSIAN-AI
5
  language:
6
  - en
7
  - de
@@ -18,25 +18,25 @@ tags:
18
  ---
19
 
20
 
21
- <img src="https://www.grag.ai/wp-content/uploads/2024/12/GRAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="GRAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
22
 
23
 
24
- # GRAG-PHI-3-mini-4B-CPT-HESSIAN-AI
25
 
26
  <!-- Provide a quick summary of what the model is/does. -->
27
 
28
- **GRAG** (**G**erman **R**etrieval **A**ugmented **G**eneration) models are designed for the German-speaking market, enabling innovation and AI solutions to drive German research collaboration in business-focused Generative AI by 2025
29
 
30
- Our GRAG-PHI-CPT model are trained on this **[GRAG-CPT](https://huggingface.co/datasets/avemio/GRAG-CPT-HESSIAN-AI) dataset.**
31
 
32
  ## Model Details
33
 
34
  The core models released in this batch are the following:
35
  | Size | Training Tokens |
36
  |------|--------|
37
- | [GRAG-PHI-CPT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-CPT-HESSIAN-AI) | 507.47 million |
38
- | [GRAG-PHI-SFT](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI) | 2.03 billion |
39
- | [GRAG-PHI-ORPO](https://huggingface.co/avemio/GRAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | 2.0577 billion |
40
  ### Model Description
41
 
42
  <!-- Provide a longer summary of what this model is. -->
@@ -46,19 +46,19 @@ The core models released in this batch are the following:
46
  - **Model type:** a Transformer style autoregressive language model.
47
  - **Language(s) (NLP):** German, English
48
  - **License:** The code and model are released under Apache 2.0.
49
- - **Contact:** [grag@avemio.digital](mailto:grag@avemio.digital)
50
 
51
 
52
  ### Model Sources
53
 
54
  <!-- Provide the basic links for the model. -->
55
 
56
- - **Training Study:** [Training Study](https://avemio.digital/wp-content/uploads/2025/01/GRAG-TRAINING-STUDY-Advancing-German-Language-AI-with-hessian-AI.pdf)
57
  - **Repositories:**
58
  - Training: [Colab-Notebook](https://colab.research.google.com/drive/18SH_aYLCnw1K7cRGOTTZ80y98V5Kquxb?usp=sharing)
59
  - Evaluation code:
60
- - [GRAG-LLM-HARD-BENCHMARK](https://github.com/avemio-digital/GRAG-LLM-HARD-BENCHMARK.git)
61
- - [GRAG-LLM-EASY-BENCHMARK](https://github.com/avemio-digital/GRAG-LLM-EASY-BENCHMARK.git)
62
  - **Technical blog post:**
63
  <!-- - **Press release:** TODO -->
64
 
@@ -72,7 +72,7 @@ Now, proceed as usual with HuggingFace:
72
  ```python
73
  from transformers import AutoModelForCausalLM, AutoTokenizer
74
 
75
- model_name = "avemio/GRAG-PHI-3.5-MINI-4B-CPT-HESSIAN-AI"
76
 
77
  tokenizer = AutoTokenizer.from_pretrained(model_name)
78
 
@@ -93,7 +93,7 @@ We are providing a comprehensive Google Colab notebook to guide users through th
93
  ## Model Details
94
 
95
  ### Data
96
- For training data details, please see the [GRAG-CPT-Dataset](https://huggingface.co/datasets/avemio/GRAG-CPT-HESSIAN-AI) documentation.
97
 
98
  #### Description
99
  CPT – Continued Pre-Training
@@ -108,7 +108,7 @@ The summarization task teaches models to distill complex information into clear,
108
  ### Architecture
109
 
110
 
111
- | Parameter | GRAG-PHI-CPT |
112
  |-----------------------|-----------------------------------------------------------------------------------------------|
113
  | **d_model** | 4096 |
114
  | **num heads** | 32 |
@@ -126,7 +126,7 @@ The summarization task teaches models to distill complex information into clear,
126
  ### Hyperparameters
127
 
128
 
129
- | Parameter | GRAG-PHI-CPT |
130
  |---------------------------|--------------------|
131
  | **warmup steps** | 50 |
132
  | **peak LR** | 5.0E-07 |
@@ -137,19 +137,19 @@ The summarization task teaches models to distill complex information into clear,
137
 
138
  ## Environmental Impact
139
 
140
- GRAG-PHI-CPT, running on NVIDIA A100 with 40 GPUs for 2 days, has an approximate power consumption as follows:
141
 
142
  It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
143
 
144
  | Model | GPU Type | Power Consumption From GPUs |
145
  |----------------|---------------------|-----------------------------|
146
- | GRAG-PHI-CPT | A100 ([Hessian AI supercomputer](https://hessian.ai/de/)) | 0.00576MWh MWh |
147
  ## Bias, Risks, and Limitations
148
 
149
  Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
150
  Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
151
 
152
- Otherwise, many facts from GRAG-MISTRAL-CPT or any LLM will often not be true, so they should be checked.
153
 
154
 
155
 
@@ -157,9 +157,9 @@ Otherwise, many facts from GRAG-MISTRAL-CPT or any LLM will often not be true, s
157
  ## Model Card Contact
158
 
159
 
160
- For errors in this model card, please contact ([grag@avemio.digital](mailto:grag@avemio.digital)).
161
 
162
- ## The GRAG AI Team
163
  [Marcel Rosiak](https://de.linkedin.com/in/marcel-rosiak)
164
  [Soumya Paul](https://de.linkedin.com/in/soumya-paul-1636a68a)
165
  [Siavash Mollaebrahim](https://de.linkedin.com/in/siavash-mollaebrahim-4084b5153?trk=people-guest_people_search-card)
 
1
  ---
2
  license: apache-2.0
3
  datasets:
4
+ - avemio/German_RAG-CPT-HESSIAN-AI
5
  language:
6
  - en
7
  - de
 
18
  ---
19
 
20
 
21
+ <img src="https://www.German_RAG.ai/wp-content/uploads/2024/12/German_RAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="German_RAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
22
 
23
 
24
+ # German_RAG-PHI-3-mini-4B-CPT-HESSIAN-AI
25
 
26
  <!-- Provide a quick summary of what the model is/does. -->
27
 
28
+ **German_RAG** (**G**erman **R**etrieval **A**ugmented **G**eneration) models are designed for the German-speaking market, enabling innovation and AI solutions to drive German research collaboration in business-focused Generative AI by 2025
29
 
30
+ Our German_RAG-PHI-CPT model are trained on this **[German_RAG-CPT](https://huggingface.co/datasets/avemio/German_RAG-CPT-HESSIAN-AI) dataset.**
31
 
32
  ## Model Details
33
 
34
  The core models released in this batch are the following:
35
  | Size | Training Tokens |
36
  |------|--------|
37
+ | [German_RAG-PHI-CPT](https://huggingface.co/avemio/German_RAG-PHI-3.5-MINI-4B-CPT-HESSIAN-AI) | 507.47 million |
38
+ | [German_RAG-PHI-SFT](https://huggingface.co/avemio/German_RAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI) | 2.03 billion |
39
+ | [German_RAG-PHI-ORPO](https://huggingface.co/avemio/German_RAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | 2.0577 billion |
40
  ### Model Description
41
 
42
  <!-- Provide a longer summary of what this model is. -->
 
46
  - **Model type:** a Transformer style autoregressive language model.
47
  - **Language(s) (NLP):** German, English
48
  - **License:** The code and model are released under Apache 2.0.
49
+ - **Contact:** [German_RAG@avemio.digital](mailto:German_RAG@avemio.digital)
50
 
51
 
52
  ### Model Sources
53
 
54
  <!-- Provide the basic links for the model. -->
55
 
56
+ - **Training Study:** [Training Study](https://avemio.digital/wp-content/uploads/2025/01/German_RAG-TRAINING-STUDY-Advancing-German-Language-AI-with-hessian-AI.pdf)
57
  - **Repositories:**
58
  - Training: [Colab-Notebook](https://colab.research.google.com/drive/18SH_aYLCnw1K7cRGOTTZ80y98V5Kquxb?usp=sharing)
59
  - Evaluation code:
60
+ - [German_RAG-LLM-HARD-BENCHMARK](https://github.com/avemio-digital/German_RAG-LLM-HARD-BENCHMARK.git)
61
+ - [German_RAG-LLM-EASY-BENCHMARK](https://github.com/avemio-digital/German_RAG-LLM-EASY-BENCHMARK.git)
62
  - **Technical blog post:**
63
  <!-- - **Press release:** TODO -->
64
 
 
72
  ```python
73
  from transformers import AutoModelForCausalLM, AutoTokenizer
74
 
75
+ model_name = "avemio/German_RAG-PHI-3.5-MINI-4B-CPT-HESSIAN-AI"
76
 
77
  tokenizer = AutoTokenizer.from_pretrained(model_name)
78
 
 
93
  ## Model Details
94
 
95
  ### Data
96
+ For training data details, please see the [German_RAG-CPT-Dataset](https://huggingface.co/datasets/avemio/German_RAG-CPT-HESSIAN-AI) documentation.
97
 
98
  #### Description
99
  CPT – Continued Pre-Training
 
108
  ### Architecture
109
 
110
 
111
+ | Parameter | German_RAG-PHI-CPT |
112
  |-----------------------|-----------------------------------------------------------------------------------------------|
113
  | **d_model** | 4096 |
114
  | **num heads** | 32 |
 
126
  ### Hyperparameters
127
 
128
 
129
+ | Parameter | German_RAG-PHI-CPT |
130
  |---------------------------|--------------------|
131
  | **warmup steps** | 50 |
132
  | **peak LR** | 5.0E-07 |
 
137
 
138
  ## Environmental Impact
139
 
140
+ German_RAG-PHI-CPT, running on NVIDIA A100 with 40 GPUs for 2 days, has an approximate power consumption as follows:
141
 
142
  It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
143
 
144
  | Model | GPU Type | Power Consumption From GPUs |
145
  |----------------|---------------------|-----------------------------|
146
+ | German_RAG-PHI-CPT | A100 ([Hessian AI supercomputer](https://hessian.ai/de/)) | 0.00576MWh MWh |
147
  ## Bias, Risks, and Limitations
148
 
149
  Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
150
  Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
151
 
152
+ Otherwise, many facts from German_RAG-MISTRAL-CPT or any LLM will often not be true, so they should be checked.
153
 
154
 
155
 
 
157
  ## Model Card Contact
158
 
159
 
160
+ For errors in this model card, please contact ([German_RAG@avemio.digital](mailto:German_RAG@avemio.digital)).
161
 
162
+ ## The German_RAG AI Team
163
  [Marcel Rosiak](https://de.linkedin.com/in/marcel-rosiak)
164
  [Soumya Paul](https://de.linkedin.com/in/soumya-paul-1636a68a)
165
  [Siavash Mollaebrahim](https://de.linkedin.com/in/siavash-mollaebrahim-4084b5153?trk=people-guest_people_search-card)