Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,7 @@
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
-
- avemio/
|
5 |
language:
|
6 |
- en
|
7 |
- de
|
@@ -18,25 +18,25 @@ tags:
|
|
18 |
---
|
19 |
|
20 |
|
21 |
-
<img src="https://www.
|
22 |
|
23 |
|
24 |
-
#
|
25 |
|
26 |
<!-- Provide a quick summary of what the model is/does. -->
|
27 |
|
28 |
-
**
|
29 |
|
30 |
-
Our
|
31 |
|
32 |
## Model Details
|
33 |
|
34 |
The core models released in this batch are the following:
|
35 |
| Size | Training Tokens |
|
36 |
|------|--------|
|
37 |
-
| [
|
38 |
-
| [
|
39 |
-
| [
|
40 |
### Model Description
|
41 |
|
42 |
<!-- Provide a longer summary of what this model is. -->
|
@@ -46,19 +46,19 @@ The core models released in this batch are the following:
|
|
46 |
- **Model type:** a Transformer style autoregressive language model.
|
47 |
- **Language(s) (NLP):** German, English
|
48 |
- **License:** The code and model are released under Apache 2.0.
|
49 |
-
- **Contact:** [
|
50 |
|
51 |
|
52 |
### Model Sources
|
53 |
|
54 |
<!-- Provide the basic links for the model. -->
|
55 |
|
56 |
-
- **Training Study:** [Training Study](https://avemio.digital/wp-content/uploads/2025/01/
|
57 |
- **Repositories:**
|
58 |
- Training: [Colab-Notebook](https://colab.research.google.com/drive/18SH_aYLCnw1K7cRGOTTZ80y98V5Kquxb?usp=sharing)
|
59 |
- Evaluation code:
|
60 |
-
- [
|
61 |
-
- [
|
62 |
- **Technical blog post:**
|
63 |
<!-- - **Press release:** TODO -->
|
64 |
|
@@ -72,7 +72,7 @@ Now, proceed as usual with HuggingFace:
|
|
72 |
```python
|
73 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
74 |
|
75 |
-
model_name = "avemio/
|
76 |
|
77 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
78 |
|
@@ -93,7 +93,7 @@ We are providing a comprehensive Google Colab notebook to guide users through th
|
|
93 |
## Model Details
|
94 |
|
95 |
### Data
|
96 |
-
For training data details, please see the [
|
97 |
|
98 |
#### Description
|
99 |
CPT – Continued Pre-Training
|
@@ -108,7 +108,7 @@ The summarization task teaches models to distill complex information into clear,
|
|
108 |
### Architecture
|
109 |
|
110 |
|
111 |
-
| Parameter |
|
112 |
|-----------------------|-----------------------------------------------------------------------------------------------|
|
113 |
| **d_model** | 4096 |
|
114 |
| **num heads** | 32 |
|
@@ -126,7 +126,7 @@ The summarization task teaches models to distill complex information into clear,
|
|
126 |
### Hyperparameters
|
127 |
|
128 |
|
129 |
-
| Parameter |
|
130 |
|---------------------------|--------------------|
|
131 |
| **warmup steps** | 50 |
|
132 |
| **peak LR** | 5.0E-07 |
|
@@ -137,19 +137,19 @@ The summarization task teaches models to distill complex information into clear,
|
|
137 |
|
138 |
## Environmental Impact
|
139 |
|
140 |
-
|
141 |
|
142 |
It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
|
143 |
|
144 |
| Model | GPU Type | Power Consumption From GPUs |
|
145 |
|----------------|---------------------|-----------------------------|
|
146 |
-
|
|
147 |
## Bias, Risks, and Limitations
|
148 |
|
149 |
Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
|
150 |
Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
|
151 |
|
152 |
-
Otherwise, many facts from
|
153 |
|
154 |
|
155 |
|
@@ -157,9 +157,9 @@ Otherwise, many facts from GRAG-MISTRAL-CPT or any LLM will often not be true, s
|
|
157 |
## Model Card Contact
|
158 |
|
159 |
|
160 |
-
For errors in this model card, please contact ([
|
161 |
|
162 |
-
## The
|
163 |
[Marcel Rosiak](https://de.linkedin.com/in/marcel-rosiak)
|
164 |
[Soumya Paul](https://de.linkedin.com/in/soumya-paul-1636a68a)
|
165 |
[Siavash Mollaebrahim](https://de.linkedin.com/in/siavash-mollaebrahim-4084b5153?trk=people-guest_people_search-card)
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
datasets:
|
4 |
+
- avemio/German_RAG-CPT-HESSIAN-AI
|
5 |
language:
|
6 |
- en
|
7 |
- de
|
|
|
18 |
---
|
19 |
|
20 |
|
21 |
+
<img src="https://www.German_RAG.ai/wp-content/uploads/2024/12/German_RAG-ICON-TO-WORDLOGO-Animation_Loop-small-ezgif.com-video-to-gif-converter.gif" alt="German_RAG Logo" width="400" style="margin-left:'auto' margin-right:'auto' display:'block'"/>
|
22 |
|
23 |
|
24 |
+
# German_RAG-PHI-3-mini-4B-CPT-HESSIAN-AI
|
25 |
|
26 |
<!-- Provide a quick summary of what the model is/does. -->
|
27 |
|
28 |
+
**German_RAG** (**G**erman **R**etrieval **A**ugmented **G**eneration) models are designed for the German-speaking market, enabling innovation and AI solutions to drive German research collaboration in business-focused Generative AI by 2025
|
29 |
|
30 |
+
Our German_RAG-PHI-CPT model are trained on this **[German_RAG-CPT](https://huggingface.co/datasets/avemio/German_RAG-CPT-HESSIAN-AI) dataset.**
|
31 |
|
32 |
## Model Details
|
33 |
|
34 |
The core models released in this batch are the following:
|
35 |
| Size | Training Tokens |
|
36 |
|------|--------|
|
37 |
+
| [German_RAG-PHI-CPT](https://huggingface.co/avemio/German_RAG-PHI-3.5-MINI-4B-CPT-HESSIAN-AI) | 507.47 million |
|
38 |
+
| [German_RAG-PHI-SFT](https://huggingface.co/avemio/German_RAG-PHI-3.5-MINI-4B-SFT-HESSIAN-AI) | 2.03 billion |
|
39 |
+
| [German_RAG-PHI-ORPO](https://huggingface.co/avemio/German_RAG-PHI-3.5-MINI-4B-ORPO-HESSIAN-AI) | 2.0577 billion |
|
40 |
### Model Description
|
41 |
|
42 |
<!-- Provide a longer summary of what this model is. -->
|
|
|
46 |
- **Model type:** a Transformer style autoregressive language model.
|
47 |
- **Language(s) (NLP):** German, English
|
48 |
- **License:** The code and model are released under Apache 2.0.
|
49 |
+
- **Contact:** [German_RAG@avemio.digital](mailto:German_RAG@avemio.digital)
|
50 |
|
51 |
|
52 |
### Model Sources
|
53 |
|
54 |
<!-- Provide the basic links for the model. -->
|
55 |
|
56 |
+
- **Training Study:** [Training Study](https://avemio.digital/wp-content/uploads/2025/01/German_RAG-TRAINING-STUDY-Advancing-German-Language-AI-with-hessian-AI.pdf)
|
57 |
- **Repositories:**
|
58 |
- Training: [Colab-Notebook](https://colab.research.google.com/drive/18SH_aYLCnw1K7cRGOTTZ80y98V5Kquxb?usp=sharing)
|
59 |
- Evaluation code:
|
60 |
+
- [German_RAG-LLM-HARD-BENCHMARK](https://github.com/avemio-digital/German_RAG-LLM-HARD-BENCHMARK.git)
|
61 |
+
- [German_RAG-LLM-EASY-BENCHMARK](https://github.com/avemio-digital/German_RAG-LLM-EASY-BENCHMARK.git)
|
62 |
- **Technical blog post:**
|
63 |
<!-- - **Press release:** TODO -->
|
64 |
|
|
|
72 |
```python
|
73 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
74 |
|
75 |
+
model_name = "avemio/German_RAG-PHI-3.5-MINI-4B-CPT-HESSIAN-AI"
|
76 |
|
77 |
tokenizer = AutoTokenizer.from_pretrained(model_name)
|
78 |
|
|
|
93 |
## Model Details
|
94 |
|
95 |
### Data
|
96 |
+
For training data details, please see the [German_RAG-CPT-Dataset](https://huggingface.co/datasets/avemio/German_RAG-CPT-HESSIAN-AI) documentation.
|
97 |
|
98 |
#### Description
|
99 |
CPT – Continued Pre-Training
|
|
|
108 |
### Architecture
|
109 |
|
110 |
|
111 |
+
| Parameter | German_RAG-PHI-CPT |
|
112 |
|-----------------------|-----------------------------------------------------------------------------------------------|
|
113 |
| **d_model** | 4096 |
|
114 |
| **num heads** | 32 |
|
|
|
126 |
### Hyperparameters
|
127 |
|
128 |
|
129 |
+
| Parameter | German_RAG-PHI-CPT |
|
130 |
|---------------------------|--------------------|
|
131 |
| **warmup steps** | 50 |
|
132 |
| **peak LR** | 5.0E-07 |
|
|
|
137 |
|
138 |
## Environmental Impact
|
139 |
|
140 |
+
German_RAG-PHI-CPT, running on NVIDIA A100 with 40 GPUs for 2 days, has an approximate power consumption as follows:
|
141 |
|
142 |
It's important to note that the actual power consumption may vary depending on the specific workload and operational conditions. For accurate power consumption measurements, using dedicated power monitoring tools is recommended.
|
143 |
|
144 |
| Model | GPU Type | Power Consumption From GPUs |
|
145 |
|----------------|---------------------|-----------------------------|
|
146 |
+
| German_RAG-PHI-CPT | A100 ([Hessian AI supercomputer](https://hessian.ai/de/)) | 0.00576MWh MWh |
|
147 |
## Bias, Risks, and Limitations
|
148 |
|
149 |
Like any base language model or fine-tuned model without safety filtering, it is relatively easy for a user to prompt these models to generate harmful and generally sensitive content.
|
150 |
Such content can also be produced unintentionally, especially in the case of bias, so we recommend users consider the risks of applications of this technology.
|
151 |
|
152 |
+
Otherwise, many facts from German_RAG-MISTRAL-CPT or any LLM will often not be true, so they should be checked.
|
153 |
|
154 |
|
155 |
|
|
|
157 |
## Model Card Contact
|
158 |
|
159 |
|
160 |
+
For errors in this model card, please contact ([German_RAG@avemio.digital](mailto:German_RAG@avemio.digital)).
|
161 |
|
162 |
+
## The German_RAG AI Team
|
163 |
[Marcel Rosiak](https://de.linkedin.com/in/marcel-rosiak)
|
164 |
[Soumya Paul](https://de.linkedin.com/in/soumya-paul-1636a68a)
|
165 |
[Siavash Mollaebrahim](https://de.linkedin.com/in/siavash-mollaebrahim-4084b5153?trk=people-guest_people_search-card)
|