what is the best chunk size
#9
by
hulianxue
- opened
considering such application scenario:
i have long-content text which of-course could not be input into embedding model at a time.
so i have to cut text into chunks; embed them; and push embeddings into vector-recall system.
so in order to achieve best recalling performance, what is the best chunk size ?
do you have any experiment on this?
or any suggestion about this according to your training data distribution?
thx!