arcee-ai
/

Llama-3-SEC-Base

Text Generation

large_language_model

continual_pre_training

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Shamane commited on Jun 19

Commit

5451b3d

•

1 Parent(s): 7b95105

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -20,7 +20,7 @@ Llama-3-SEC is a state-of-the-art domain-specific large language model trained o
 ## Model Details
 - **Base Model:** Meta-Llama-3-70B-Instruct
-- **Training Data**: ***This is an intermediate checkpoint of our final model, which has seen 20B tokens so far***. The full model is still in the process of training.*. The final model is being trained with 72B tokens of SEC filings data, carefully mixed with 1B tokens of general data from Together AI's RedPajama dataset: [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) to maintain a balance between domain-specific knowledge and general language understanding
 - **Training Method:** Continual Pre-Training (CPT) using the Megatron-Core framework, followed by model merging with the base model using the state-of-the-art TIES merging technique in the Arcee Mergekit toolkit
 - **Training Infrastructure:** AWS SageMaker HyperPod cluster with 4 nodes, each equipped with 32 H100 GPUs, ensuring efficient and scalable training of this massive language model

 ## Model Details
 - **Base Model:** Meta-Llama-3-70B-Instruct
+- **Training Data**: ***This is an intermediate checkpoint of our final model, which has seen 20B tokens so far. The full model is still in the process of training.*** The final model is being trained with 72B tokens of SEC filings data, carefully mixed with 1B tokens of general data from Together AI's RedPajama dataset: [RedPajama-Data-1T](https://huggingface.co/datasets/togethercomputer/RedPajama-Data-1T) to maintain a balance between domain-specific knowledge and general language understanding
 - **Training Method:** Continual Pre-Training (CPT) using the Megatron-Core framework, followed by model merging with the base model using the state-of-the-art TIES merging technique in the Arcee Mergekit toolkit
 - **Training Infrastructure:** AWS SageMaker HyperPod cluster with 4 nodes, each equipped with 32 H100 GPUs, ensuring efficient and scalable training of this massive language model