LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
•
2309.12307
•
Published
•
84
Note Train a LoRA by shifting local attention. Achieve better perplexity than train-free methods
Note Explain what may cause the instability of transformer training. LR sensitive is a good metric for training stability. The explanation of why loss spikes occur is also very interesting