Post
12258
๐ **InfiniTransformer, Gemma/Llama3 based Implementation!** ๐
> Update @ 2024.04.19: It now supports Llama-3!
> Note: this implementation is unofficial
This implementation is designed to handle virtually infinite context lengths.
Here's the github repo: https://github.com/Beomi/InfiniTransformer
๐ **Read the original Paper:** https://arxiv.org/abs/2404.07143
## **Focus on Infini-Attention**
- **2 Types of Implementation available:** Attention-layer only implementation / Model & Train-wise implementation
- **Fixed(segment dependent) Memory Usage:** Enables training on larger models and longer sequences without the memory overhead typical of standard Transformer implementations.
- **Infinite Context Capability:** Train with unprecedented sequence lengthsโimagine handling up to 1 million sequence lengths on standard hardware!
- You could train Gemma-2B with 1M sequence length with 2K segmentation size with single H100 GPU.
## **Try InfiniTransformer**
1. **Clone the repository:**
2. **Install necessary tools:**
3. **Dive Deep into Custom Training:**
- Train with extensive sequence lengths using scripts such as
for more detailed info, please visit Repo: https://github.com/Beomi/InfiniTransformer
Look forward to see your feedbacks! ๐
ps. Training loss plot is here ๐
> Update @ 2024.04.19: It now supports Llama-3!
> Note: this implementation is unofficial
This implementation is designed to handle virtually infinite context lengths.
Here's the github repo: https://github.com/Beomi/InfiniTransformer
๐ **Read the original Paper:** https://arxiv.org/abs/2404.07143
## **Focus on Infini-Attention**
- **2 Types of Implementation available:** Attention-layer only implementation / Model & Train-wise implementation
- **Fixed(segment dependent) Memory Usage:** Enables training on larger models and longer sequences without the memory overhead typical of standard Transformer implementations.
- **Infinite Context Capability:** Train with unprecedented sequence lengthsโimagine handling up to 1 million sequence lengths on standard hardware!
- You could train Gemma-2B with 1M sequence length with 2K segmentation size with single H100 GPU.
## **Try InfiniTransformer**
1. **Clone the repository:**
bash
git clone https://github.com/Beomi/InfiniTransformer
2. **Install necessary tools:**
bash
pip install -r requirements.txt
pip install -e git+https://github.com/huggingface/transformers.git@b109257f4f#egg=transformers
3. **Dive Deep into Custom Training:**
- Train with extensive sequence lengths using scripts such as
./train.gemma.infini.noclm.1Mseq.sh
.for more detailed info, please visit Repo: https://github.com/Beomi/InfiniTransformer
Look forward to see your feedbacks! ๐
ps. Training loss plot is here ๐