200k model

#2
by KnutJaegersberg - opened

One application of quantized models I'm interested in is long context inference. Will you make a 2 bit Yi-34b-200k?
I think that would be great for few shot learning over long contexts!

Thanks for your continuous attention to this work. The long-context model is in the plan and by now we pay more attention to the lossless (<1%) quantization under lower bits (e.g., 2-bit, which is already quite close to our expectation). Once we reach that point, we will release more compressed models to the community step by step.

If you can share some meaningful metrics about how can we measure the performance of "lossless" compressed long-context models, that would be ultra helpful for our current research, thanks.

I can refer you to the works on long context window LLMs that are already around.

Yarn uses perplexity and compares the perplexity of different models at different context lengths:
https://huggingface.co/NousResearch/Yarn-Llama-2-70b-32k

Giraffe makes use of several QA benchmarks:
https://github.com/abacusai/long-context

longllama assesses improvement as passkey retrieval, QA over research papers and the improvement in few shot learning from using longer context few shot instances:
https://huggingface.co/syzymon/long_llama_3b

longlora additionally uses topic retrieval over long contexts as evaluation

https://arxiv.org/abs/2309.12307

KnutJaegersberg changed discussion status to closed

Sign up or log in to comment