It can not even answer this question: 一公斤的棉花和一公斤的铁,哪一个更重?

#16
by lucasjin - opened

It can not even answer this question: 一公斤的棉花和一公斤的铁,哪一个更重?

Which template are you using, and have you considered to ask the model in English instead?

Welp, I tested multiple way to massage the model in English, and it seems to insist density somehow matter in the question of weight, thus iron is heavier.
Model doesn't give correct answer unless I ask in very specific manner and template, but then I start asking the same question to a few other model, same old issue.
It is what it is I suppose, LLM truthfulness is always problematic when the internet (which presumably made up orca 2 dataset) can't make up their mind of this simple question to begin with, I blame dumb human.

Well, that's what happens when AI is trained with human data. it will always have some problems, even more remarquable when dealing with dilemas and commun misconceptions. However, it also depends on the models, we are talking about a 13B model here, it's a powerfull one but we cannot expect it to have greater reasonning than humans. Even the most powerfull ones like llama 70B that are capable of answering your question correctly from what I tried still lack a lot in reasonning, but we are getting there ! To be honnest, I am still surprised how well these models with 13B and even 7B params are doing, and cannot wait to see these models being even more optimised.

I got a correct answer. slightly restructured. And not in first try.
image.png
The model is from https://huggingface.co/TheBloke/Orca-2-13B-GGUF
orca-2-13b.Q6_K.gguf

Yeah, after playing around I also managed to get sometimes interesting answers, tho it feels like gambling. Well, let's hope this "gambling game" will have better and better chances of gettin us a win.

This comment has been hidden

Yeah, after playing around I also managed to get sometimes interesting answers, tho it feels like gambling. Well, let's hope this "gambling game" will have better and better chances of gettin us a win.

LLMs should be deterministic unless you sample.

@lucasjin Works for me off the bat.

Screenshot 2023-11-30 172700.png

Yeah, after playing around I also managed to get sometimes interesting answers, tho it feels like gambling. Well, let's hope this "gambling game" will have better and better chances of gettin us a win.

LLMs should be deterministic unless you sample.

Or... unless you change the prompt like i did? You can always add some randomness to it, like having a previous conversation with it or rephrase the prompt.

I am not able to run a model properly. I am on RTX4000, and it takes a lot of time to process a single answer. Do we have any solution for that

Sign up or log in to comment