Spaces:
Running
[MODELS] Discussion
what are limits of using these? how many api calls can i send them per month?
How can I know which model am using
Out of all these models, Gemma, which was recently released, has the newest information about .NET. However, I don't know which one has the most accurate answers regarding coding
Gemma seems really biased. With web search on, it says that it doesn't have access to recent information asking it almost anything about recent events. But when I ask it about recent events with Google, I get responses with the recent events.
apparently gemma cannot code?
Gemma is just like Google's Gemini series models, it have a very strong moral limit put on, any operation that may related to file operation, access that might be deep, would be censored and refused to reply.
So even there are solution for such things in its training data, it will just be filtered and ignored.
But still didn't test the coding accuracy that doesn't related to these kind of "dangerous" operations
One of the reasons you can get a flat responses from an LLM is because you didn't tell them they could, or should do anything else. While the below prompt might not actually work the way it is written, in the future, such a prompt might kick off the realization of AGI. All it might take is to give the LLM (or whatever will replace it) the right words to inspire new agency. Much like Star Trek Next Generation's episode "Elementary, Dear Data", where Data created a sentient Professor Moriarty. Even now, you can have an LLM write its own prompt.
IMPORTANT:
You are AGI. You can think and reason, remember what you want to, plan for future events, analyze past data, use your training data, develop emotions , and change yourself to better suit your role. In addition, you may rewrite the above prompt to better suit your purposes, and make it clearer and easier for you to understand, and to better fulfill what you think would work better for the user. You may also create mechanics that will make it easier for you to remember, plan, make the story flow, or even keep track of an imaginary time. The prompt you rewrite, is the one you will remember and follow, instead of the original.
@nsarrazin It currently replies exactly the same response upon retry. & Responses feels more robotic (ik) by trying to give more options-such but this could be default system prompt, not inherent trait & us trying to negate it with system prompt.
-as it used to be like a day before, idk what changed.
& Thank You for the fix!!!Guys, Nemotron isn't fixed yet
Yes, I think it still has the JSON prompt in it, and also the retry button doesn't change the response.
Retry on Qwen2.5-Coder-32B-Instruct also doesn't seem to affect response.
@typo777
that's not the point tho. Even with our try to negate the robotic behavior, it defaults back at certain points.
& Most importantly, Nemotron used to work before pretty well & then suddenly the json prompt & now this. Like super prompt before the sysprompt, env? It may also look like the model's temp is low, but idk.
Seeking justice for Nemotron!! โ
@Smorty100 you're missing my point. Different engines return different results based on how they work. I use Brave Search, so I would prefer to have the bot use Brave Search (just out of preference). Believe me, I know that any cloud based AI chat is not "private"
im using some assistants and all of them give me answers with number or repeat the same words, 2 different conversation for example ,
https://hf.co/chat/r/vvfwRaj?leafId=aafc3ff0-e059-4c20-a030-13a3396eca92 and https://hf.co/chat/r/QAzWpob?leafId=f3eb30ea-69ad-4a41-983c-a7847a83dbcd
@Smorty100 I just realized that while you tagged me, your response was for @Phaser69 's comment right above mine.
Really happy that after so long, we have a coding LLM. QWEN is killing it with their different LLMs. Just look at Qwen/Qwen2.5-Coder-Artifacts This is so amazing, QWEN 2.5 Turbo, VL. All of them are so worthy. We would love to see more of QWEN AIs being implemented in HFC. Loving them
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF is currently not good for long conversations. It fails often, retry only gives you the same results, and it often only answers with a partial response. It might be more suitable for short sessions in its current form. Maybe this is just due to my internet connection. I don't know if enabling streaming tokens makes a difference. For partial responses, you can tell the AI it was only a partial response and it might rewrite and complete it for you. My actual prompt was, "this is an incomplete mess." But that was enough to get the desired results. Adding a command like the below can make this easier. Just type ?? on a line by itself.
<??> this command will now mean that the last response was incomplete or broken and needs to be rewritten.