Crystalcareai
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -19,9 +19,9 @@ Llama-3-SEC is a state-of-the-art domain-specific large language model trained o
|
|
19 |
## Model Details
|
20 |
|
21 |
- **Base Model:** Meta-Llama-3-70B-Instruct
|
22 |
-
- **Training Data:** 19B tokens of SEC filings data, carefully mixed with 1B tokens of general data from Together AI's RedPajama dataset: [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) to maintain a balance between domain-specific knowledge and general language understanding
|
23 |
-
- **Training Method:** Continual Pre-Training (CPT) using the Megatron-Core framework, followed by model merging with the base model using the state-of-the-art TIES merging technique in the Arcee Mergekit toolkit
|
24 |
-
- **Training Infrastructure:** AWS SageMaker HyperPod cluster with 4 nodes, each equipped with 32 H100 GPUs, ensuring efficient and scalable training of this massive language model
|
25 |
|
26 |
## Use Cases
|
27 |
|
@@ -55,9 +55,9 @@ These results demonstrate significant improvements in domain-specific performanc
|
|
55 |
|
56 |
## Training and Inference
|
57 |
|
58 |
-
Llama-3-SEC has been trained using the
|
59 |
|
60 |
-
To run inference with the Llama-3-SEC model using the
|
61 |
|
62 |
```python
|
63 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
19 |
## Model Details
|
20 |
|
21 |
- **Base Model:** Meta-Llama-3-70B-Instruct
|
22 |
+
- **Training Data:** 19B tokens of SEC filings data, carefully mixed with 1B tokens of general data from Together AI's RedPajama dataset: [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) to maintain a balance between domain-specific knowledge and general language understanding.
|
23 |
+
- **Training Method:** Continual Pre-Training (CPT) using the Megatron-Core framework, followed by model merging with the base model using the state-of-the-art TIES merging technique in the Arcee Mergekit toolkit.It then underwent supervised fine-tuning on an 8xH100 node using [Spectrum](https://arxiv.org/abs/2406.06623). We used a mixture of custom domain specific and general open-source datasets.
|
24 |
+
- **Training Infrastructure:** AWS SageMaker HyperPod cluster with 4 nodes, each equipped with 32 H100 GPUs, ensuring efficient and scalable training of this massive language model.
|
25 |
|
26 |
## Use Cases
|
27 |
|
|
|
55 |
|
56 |
## Training and Inference
|
57 |
|
58 |
+
Llama-3-SEC has been trained using the chatml chat template. This template ensures that the model maintains its strong conversational abilities while incorporating the domain-specific knowledge acquired during the CPT process.
|
59 |
|
60 |
+
To run inference with the Llama-3-SEC model using the chatml chat template, you can use the following code:
|
61 |
|
62 |
```python
|
63 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|