m-a-p
/

Amber-Reproduce-88.08B

Inference Endpoints

Model card Files Files and versions Community

Amber-Reproduce-88.08B / README.md

Chasell's picture

Create README.md

1fb1061 verified 3 months ago

|

raw history blame contribute delete

No virus

1.01 kB

	Architecture & Training Configuration:

	- Base Model Configuration: This variant is built upon the Llama2-7B configuration, ensuring a robust foundation that aligns with the latest advancements in model architecture.

	- Sequence Length Adaptation: Originally processed data for a sequence length of 2048 was detokenized and re-encoded to fit a sequence length of 4096. This step follows the preprocessing strategy of Megatron-LM, enhancing our model's capacity to understand and generate more complex sequences.

	- Batch Size & Token Management: We adopted a batch size capable of managing 4 million tokens, tailored to accommodate the increased sequence length and ensure efficient data processing.

	- Integration of GQA Technologies: To boost training efficiency, our configuration includes the integration of Gradient Quantization and Aggregation technologies. With 32 attention heads and a group size of 4, this feature significantly enhances the model's learning and processing capabilities.