Occured problem at long context
#3
by
Se-Hun
- opened
I found empty output string when long context is passed to this model.
As my inference testing, i suggest that this problem is occurred in case of text longer than 2000 tokens (or about 2040 tokens) is passed.
Why this problem is occured ? Is it caused by your dataset configurations ?
Se-Hun
changed discussion title from
List of datasets
to Occured problem at long context
Did you check max_position_embedding in config.json? I guess this problem occurd by token length. Also, check the tokenizer with your data language. Becuase llama's vocab does not contain much tokens except english subwords.