Shamane commited on
Commit
923b1b7
1 Parent(s): 8258780

Update README.md

Browse files

minor read me edit

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -25,7 +25,7 @@ GGUFS: https://huggingface.co/arcee-ai/Llama-3-SEC-Chat-GGUF
25
  ## Model Details
26
 
27
  - **Base Model:** Meta-Llama-3-70B-Instruct
28
- - **Training Data:** 19B tokens of SEC filings data, carefully mixed with 1B tokens of general data from Together AI's RedPajama dataset: [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) to maintain a balance between domain-specific knowledge and general language understanding.
29
  - **Training Method:** Continual Pre-Training (CPT) using the Megatron-Core framework, followed by model merging with the base model using the state-of-the-art TIES merging technique in the Arcee Mergekit toolkit. It then underwent supervised fine-tuning on an 8xH100 node using [Spectrum](https://arxiv.org/abs/2406.06623). We used a mixture of custom domain specific and general open-source datasets.
30
  - **Training Infrastructure:** AWS SageMaker HyperPod cluster with 4 nodes, each equipped with 32 H100 GPUs, ensuring efficient and scalable training of this massive language model.
31
 
 
25
  ## Model Details
26
 
27
  - **Base Model:** Meta-Llama-3-70B-Instruct
28
+ - **Training Data**: ***This is an intermediate checkpoint of our final model, which has seen 20B tokens so far. The full model is still in the process of training.*** The final model is being trained with 72B tokens of SEC filings data, carefully mixed with 1B tokens of general data from Together AI's RedPajama dataset: [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) to maintain a balance between domain-specific knowledge and general language understanding
29
  - **Training Method:** Continual Pre-Training (CPT) using the Megatron-Core framework, followed by model merging with the base model using the state-of-the-art TIES merging technique in the Arcee Mergekit toolkit. It then underwent supervised fine-tuning on an 8xH100 node using [Spectrum](https://arxiv.org/abs/2406.06623). We used a mixture of custom domain specific and general open-source datasets.
30
  - **Training Infrastructure:** AWS SageMaker HyperPod cluster with 4 nodes, each equipped with 32 H100 GPUs, ensuring efficient and scalable training of this massive language model.
31