Is the model purely extractive?

#6
by apolo - opened

Hello Numind!

Thank you for sharing your work. You have done an amazing job :)

I have been playing with the three versions of the model and I have realized that some times, for some sentences, the model does not extract the text exactly as it is written.

Noted: the samples I have used are in Spanish and French.

Is it normal?
How could I improve it to get always the same text ?

Thank you !

NuMind org

Hi Apolo, thanks for trying out NuExtract!

While this first version of the model has been trained to prioritize extracting text verbatim from the input, there is technically no formal guarantee of this. However, we also only trained and tested on English data, so I suspect the problem you're encountering is a result of degraded model performance from the change in language.

The only solutions that come to mind would be to try providing some few-shot examples or continue fine-tuning the model with your own data in the target language(s). Otherwise, we are hoping to release a multilingual model sometime in the near future.

Great, thank you for answering!

I will test it and see if I can get better results!
Eager to see your next multilingual models.

apolo changed discussion status to closed

Sign up or log in to comment