Max output tokens?

#12

by stri8ted - opened Apr 18, 2024

Discussion

stri8ted

Apr 18, 2024

I understand the input context length is 8k, but what about the output?

npotts

Apr 19, 2024

•

edited Apr 19, 2024

The output takes the same space as the input. The model completes the 8k token space with the response. You could slide the contxt window and get more output, but then you're losing context at the beginning.

Example, if your input is 100 tokens, you have ~7900 tokens for completion. But if your input is 7900 tokens, you have ~100 tokens left for response until your input starts getting trimmed out. The model can only pay attention to 8k tokens at most.

YalunHu

May 9, 2024

I understand the input context length is 8k, but what about the output?

May I ask where I can check this "8k context length" configuration for the llama3 model? Thanks!

npotts

May 9, 2024

I understand the input context length is 8k, but what about the output?

May I ask where I can check this "8k context length" configuration for the llama3 model? Thanks!

https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/blob/main/config.json

look at max_position_embeddings

averyyu99

Dec 13, 2024

The output takes the same space as the input. The model completes the 8k token space with the response. You could slide the contxt window and get more output, but then you're losing context at the beginning.

Example, if your input is 100 tokens, you have ~7900 tokens for completion. But if your input is 7900 tokens, you have ~100 tokens left for response until your input starts getting trimmed out. The model can only pay attention to 8k tokens at most.

Thanks for you detailed explanation! However, I am wondering if the prompt length was count towards the 8000 tokens limit? As I see "system

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024 <Prompt...>" in my output

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment