Spaces:
Running
[MODELS] Discussion
what are limits of using these? how many api calls can i send them per month?
How can I know which model am using
Out of all these models, Gemma, which was recently released, has the newest information about .NET. However, I don't know which one has the most accurate answers regarding coding
Gemma seems really biased. With web search on, it says that it doesn't have access to recent information asking it almost anything about recent events. But when I ask it about recent events with Google, I get responses with the recent events.
apparently gemma cannot code?
Gemma is just like Google's Gemini series models, it have a very strong moral limit put on, any operation that may related to file operation, access that might be deep, would be censored and refused to reply.
So even there are solution for such things in its training data, it will just be filtered and ignored.
But still didn't test the coding accuracy that doesn't related to these kind of "dangerous" operations
is it possible to know what parameters this models are running ?
is it possible to know what parameters this models are running ?
It's all here! https://github.com/huggingface/chat-ui/blob/main/.env.template
is it possible to know what parameters this models are running ?
It's all here! https://github.com/huggingface/chat-ui/blob/main/.env.template
thanks this is super useful OWO
What happened to Falcon? It was my favorite. :(
@SAMMdev Falcon was too costly to run at scale (for now), we might put back a more optimized version in the future
I would like to use "mistralai/Mixtral-8x7B-Instruct-v0.1";
Please could tell me what is the precision of the model behind the chat? Thanks
@SAMMdev Falcon was too costly to run at scale (for now), we might put back a more optimized version in the future
What if we use Falcon 70B?
smaug 72B would be a great addition
Iโm unable to get output from CodeLlama
I'm also voting for Samsung 72B. We already have the two Llama 70B models on here soo to me it seems reasonable to integrate this one as well.
This is probably not going to happen, but xai-org/grok-1 would be insane to have here
IYH Why is the title of most chats (on the left panels roster) "๐ค Hello! I am a language model AI assistant."?
This implies that the system prompt of my assistants is not the fundamental prompt, but there is an inbuilt base prompt that is run before my system prompt .. is this correct roughly and if so how do I change this base prompt for Mistral LLM ?
Could you consider DBRX Instruct and Command-R? The official space for DBRX Instruct is too limited (it only allows for a 5-turn conversation) and there is no space for Command-R.
IYH thank you for your advice. Apologies I have no idea what the concepts mean or what to do "Could you consider DBRX Instruct and Command-R? The official space for DBRX Instruct is too limited (it only allows for a 5-turn conversation) and there is no space for Command-R." (fwiw I prompted mistral about it and it did not know either.)
Would you kindly elaborate (or point me towards a resource that explains this
IYH thank you for your advice. Apologies I have no idea what the concepts mean or what to do "Could you consider DBRX Instruct and Command-R? The official space for DBRX Instruct is too limited (it only allows for a 5-turn conversation) and there is no space for Command-R." (fwiw I prompted mistral about it and it did not know either.)
Would you kindly elaborate (or point me towards a resource that explains this
Huggingface will notify you when someone posts in a discussion you've commented on, even if they didn't directly reply to you. I was suggesting two new models, unrelated to your question.
Which model is better to use?
How to know difference of them?
IYH Why is the title of most chats (on the left panels roster) "๐ค Hello! I am a language model AI assistant."?
This implies that the system prompt of my assistants is not the fundamental prompt, but there is an inbuilt base prompt that is run before my system prompt .. is this correct roughly and if so how do I change this base prompt for Mistral LLM ?
@DYB5784 HF Chat has a Mistral 7B model setup with system prompt for the task of summarizing the first chat prompt/msg into a title for the chat history log, so unless one explicitly addresses that in the first msg it is what it is ig. and we can always rename it. Still, i think it would have been awesome if we could customize the naming style/prompt it ourselves.
Is openchat/openchat-3.5-0106 coming back? Was it removed to be upgraded?
Is openchat/openchat-3.5-0106 coming back? Was it removed to be upgraded?
It looks like they also removed the Meta models :(
Hope they add command r instead of bringing those back tbh.
Hope they add command r instead of bringing those back tbh.
What is command r? I'm a newb.
Hope they add command r instead of bringing those back tbh.
What is command r? I'm a newb.
Command-r+ is a new LLM from Cohere that overtook GPT-4 on the openllm leaderboard.
hey!
On HuggingChat we aim to always propose a small selection of models which will evolve over time as the field of ML progresses forward ๐ฅ
Stay tuned!
On Hugging Chat we aim to always propose a small selection of models which will evolve over time as the field of ML progresses forward ๐ฅ
Stay tuned!
Yup, small models are better and lighter (Cost friendly) + now Hugging chat ai has internet access so small models like (Mixtrail, Nous hermes, etc.) can even performs very better in many areas then many 70b models, and
We are happy to see what's coming next ๐ฅ๐ฅ.
Hope they add command r instead of bringing those back tbh.
The Meta ones felt misaligned and gave a lot of refusals. The 70b code one would lecture and moralize even with nothing bad in the prompt.
I hope LLaMA 3 isn't as misaligned mess.
The Meta ones felt misaligned and gave a lot of refusals. The 70b code one would lecture and moralize even with nothing bad in the prompt.
This is because they do not do fine tuning, Manytime fully Finetuned model of Llama 7b is better than no fine tuned llama 70b
@Victor Thank you for the new model! but if possible, i think a slight warning/notification should be very helpful to us about which model will be taken down!
Goodbye, OpenChat! it was really good for 7B!
Agree with 1st point
Cohere Command R+ is now on HuggingChat!
... Hey Victor, if you're gonna surprise us with new models like this, then you can remove anything you want without notify anyone, not even Clement, xd.
But jokes aside, this is just great, if your adding/removal policy keeps like this, in 3 months we will have hugging-face assistants for everything, long context, coding, reasoning/creativity, etcetera.
Thanks a lot!!!
P.D.: I was expecting just Command R, but having plus with all the HF interface means that I will be able to make a lot of assistants that on the past only worked decently as GPTs with GPT4.
you can remove anything you want without notify anyone
@Ironmole
you're literally ok with leaving all the active chats abandoned, aren't you? what can we say here? but lots of users will be kinda saddened if active/hanging chats are suddenly no longer continuable. (i know they usually take down the models with the least traffic, so, that's how it is ig)
and it seems all the other assistants have been migrated to mistralai/Mixtral-8x7B-Instruct-v0.1.
Yes we migrated all assistants with deprecated models to the default model, which at the time was Mixtral 8x7B!
command r + is really good
Iโm worried about that itโs not gonna be free forever, like donโt get me wrong I have FULL faith in the hugging chat team, itโs just this in my eyes itโs a perfect replacement to ChatGPT. So I just need some reassurance itโll stay free
Iโm worried about that itโs not gonna be free forever, like donโt get me wrong I have FULL faith in the hugging chat team, itโs just this in my eyes itโs a perfect replacement to ChatGPT. So I just need some reassurance itโll stay free
I think that It'll stay free.
But if they have budget issue then.
They can integrate ads to make it free forever.
and also introduce premium features (Like some premium model only use by premium or Badge to pro, etc.)
Please leave that Command R plus unquantized on huggingchat, I'd even pay 30$ a month for it. In my opinion its perfect for translating. I would use it locally but I don't have a server that could run the full model and using quants will make the model worse.
I would like to pay 9$ per month for longer context + relaxed rate limit + unquantized usage of huggingface Chat
Hope you guys keep HuggingChat free forever ๐
As hugging face gives access to Host unlimited models, datasets, and Spaces for free.
Hope so Hugging Chat will remain free.
A famous Hindi Quote - "Umeed Pe duniya kayam Hai"
Translation - "The world is alive in hope."
Well, see what happens in future.
Upon closer inspection it seems like Nous-Hermes-2-Mixtral-8x7B-DPO is still a bit better than command r plus at translating from Chinese to english. It understands the meaning a bit more and especially writes it far better to read. I wonder how good the new 8x22 instruct model of mistral is gonna be. Anyway all the models are really good and have amazing uses! I hope we can access those that get released in the future too. Thank you very much for hosting them.
Umeed Pe duniya kayam Hai
๐ฏ
@nsarrazin will assistant creators get a choice in which model to migrate to? i think this should be an option as recreation in another model is like starting anew.
a past comment:
- What will happen to the Assistant if a model is taken down? Migrate to new llm with context token +prompt as we/bot authors can change sys prompt of the assistants anytime? unlikely ig. or we could have a migration system for our old chats.
- shall there be a "View Sys Prompt" just like in regular chats beside/below the bot button? As the assistant button at the top shows the latest prompt only while the chat might have started with another prompt. (doesn't change the already active chat really)(once it recognized the changed sys prompt upon me mentioning only a part of it)
zephyr mixtral 8x 22b from hugging face comming soon ?
zephyr-orpo-141b-A35b-v0.1
What happened to the openchat model why was it removed
What happened to the openchat model why was it removed
Because very few people are using it.
We just released HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
on HuggingChat!
Try it out here: https://huggingface.co/chat/models/HuggingFace4/zephyr-orpo-141b-A35b-v0.1
Shout out to @nicksuh who called it early ๐
What happened to the openchat model why was it removed
@Gerrytheskull models come and go.. nothing is permanent sadly.. besides OpenChat wasn't being used by that many of users i think. plus, new models were added.. command r+ and now zephyr
@nsarrazin
could you add model usage-over time graph on the model list page?
It would be more engaging and fun + new users can see what's trending.
Check out my models-https://hf.co/chat/assistant/65c8539d02294f8760ccf784
@nsarrazin could you add model usage-over time graph on the model list page?
It would be more engaging and fun + new users can see what's trending.
- feature like Assistant of the week (Like space have space of the week)
Is it possible to add the WizardLM-2-8x22B model to the available models?
Wizard seems like a killer model! We would love to see it on HuggingChat.
There is only one big problem with this is that it has 141B parameters which makes it slow.
There is only one big problem with this is that it has 141B parameters which makes it slow.
The CohereForAI/c4ai-command-r-plus 110B params model works normally, so this should also work in normal mode. Additionally, there is the HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1 model with 141B params that also works quickly and is available in HuggingChat.
@CmetankaPY Ohh, i forget about them.
@CmetankaPY I found a discussion which stating that Zephyr has only 35b active parameters
https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/discussions/9
Did anyone notice that Zephyr 141B-A35B isn't even nearly as good as Command R+, despite having more parameters? I also noticed that some smaller models perform way better than Zephyr 141B-A35B.
Did anyone notice that Zephyr 141B-A35B isn't even nearly as good as Command R+, despite having more parameters? I also noticed that some smaller models perform way better than Zephyr 141B-A35B.
Because zephyr ha only 35b active parameters not 141b.
Read this for more info - https://huggingface.co/HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1/discussions/9
Please add AI generated images.
Please add AI generated images.
You can use image generation in chat using pollination
Some Example:-
https://hf.co/chat/assistant/6612cb237c1e770b75c5ebad
https://hf.co/chat/assistant/65bff23f5560c1a5c0c9dcbd
Llama-3 seems great, but I expected it to beat GPT-4 ๐ . So far can't see any open-source model that comes close to Command R+ performance
Llama-3 seems great, but I expected it to beat GPT-4 ๐ . So far can't see any open-source model that comes close to Command R+ performance
Wizard Beated Command R+ and Even a very good competitor of ChatGPT
I believe Wizard will be the new open-source king, but I can't find it anywhere, I think Microsoft deleted it for some reason.
I believe Wizard will be the new open-source king, but I can't find it anywhere, I think Microsoft deleted it for some reason.
what did Satya see
Hey victor could you adjust the repetition penalty for llama? Because Iโm trying to do some creative writing but it literally gives me the same output every time I retry
Hey victor could you adjust the repetition penalty for llama? Because Iโm trying to do some creative writing but it literally gives me the same output every time I retry
just do it yourself from advanced setting at the bottom of models name
The quality of Dolphin-Mistral/Mixtral of Cognitivecomputations is much better than that of Nous-Hermes, which may be a more suitable choice. I also used them in my own local ollama - until Command-R+ subverted the game.
P.S. Llama3 is so bad for my use. It is not even as good as the quantitative version of the above two models.
I just checked the model configuration of Command-R-Plus and noticed that the context window is limited. Is it because of cost consideration? If so, I hope to add a Q4 version for 128K-context -window supportโโand it should be much faster.
I just checked the model configuration of Command-R-Plus and noticed that the context window is limited. Is it because of cost consideration? If so, I hope to add the Q4 version for 128K-context -window supportโโand it should be much faster.
But what about quality, quantization decreases quality very much.
I just checked the model configuration of Command-R-Plus and noticed that the context window is limited. Is it because of cost consideration? If so, I hope to add the Q4 version for 128K-context -window supportโโand it should be much faster.
But what about quality, quantization decreases quality very much.
Then Q8? with extremely low Temp, Top_P and Top_K. In any case, the quality of command-R+ surpasses most models.
In addition, the impact of quantification on quality is not so devastating. The latest research can even be quantified with 1bit to achieve a nearly non-quantification effect.
Detailed review of Llama 3 70B:
Coding: 8/10
Capability: Llama 3 is capable of generating code snippets in various programming languages, including Python, Java, C++, and JavaScript. It can also help with code completion, debugging, and optimization.
Limitation: While it can generate code, it may not always be correct or efficient. It may also struggle with complex algorithms or nuanced programming concepts.
Example: I asked Llama3 to write 10 complex questions. It generated a correct solution for 9, but some of them were not the best one.
Creative Writing: 9/10
Capability: Llama 3 is capable of generating creative writing, including stories, poetry, and dialogues. It can understand context, tone, and style, and produce writing that is engaging and coherent.
Limitation: While it can generate creative writing, it may lack the nuance and depth of human-written work. It may also struggle with complex themes or abstract concepts.
Example: I gave 10 creative story generation tasks to him. It generated a engaging and well-structured story, but it lacked the emotional depth and complexity of a human-written work.
Multiple Language: 8.5/10
Capability: Llama 3 is capable of understanding and generating text in multiple languages, including English, Hindi, Chinses, Japanese, Spanish, French, German, Italian, and many others. It can also translate text from one language to another.
Limitation: While it can understand and generate text in multiple languages, it may not always be perfect in terms of grammar, syntax, or idiomatic expressions.
Example: I givee Llama 3 10 paragraphs of different languages to translate. It generated a accurate translation, but it lacked emotions, nuance and cultural context of a human.
General Knowledge: 9/10
Capability: Llama 3 has a vast knowledge base and can answer questions on a wide range of topics, including history, science, technology, literature, and more.
Limitation: While it has a vast knowledge base, it may not always be up-to-date or accurate. It may also struggle with abstract or nuanced concepts.
Example: I asked llama 3 about 10 diff complex GK questions . It generated a accurate and informative response, but it lacked the depth and nuance.
Maths: 6.5/10
Capability: llaama 3 is capable of solving mathematical problems, including algebra, geometry, calculus, and more. It can also help with mathematical concepts and theories.
Limitation: While it can solve mathematical problems, it may not always be able to explain the underlying concepts or find efficient approach and many times give wrong solutions.
Example: I asked Llama 3 to solve 10 complex high school problem. It generated a correct solution for 6 only, in 1 it follow right method at half and remaining 3 are purely incorrect.
Internet Search: 8/10
Capability: Llama3 can search the internet and provide relevant information on a wide range of topics. It can also help with finding specific information or answering complex questions.
Limitation: While it can search the internet, it may not always be able to evaluate the credibility or accuracy of the sources it finds.
Comparison with other models:
Llama 2
Llama 3 is a significant improvement over LLaMA 2 in terms of its capabilities and performance. It has a more advanced language model, better understanding of context and nuance, and improved generation capabilities. It is also more knowledgeable and accurate in its responses.
.
.
.
(More to be added)
.
.
.
Overall, Meta-Llama-3-70B-Instruct is a powerful and versatile language model that can perform a wide range of tasks and answer complex questions. While it has its limitations, it is a significant improvement over previous language models and has the potential to revolutionize the field of natural language processing.
.....................................................................................................
If you liked the review and want review for more models Give a thumbs up ๐
Detailed review of Llama 3 70B:
Please do not use LLMs-style correct nonsense to describe the model's performance, thank you!
Note: Why do I think Dolphin performs better?
- System prompt-free cross-language capabilities. When communicating in Chinese, Llama(1/2/3) or vanilla mistral 7B must be induced with system prompts to spit out fragmented Chinese. Nous-Hermes, CR+, and the Dolphin series do not have this problem.
- Uncensored. Dolphin will never reject you.
- It even has a programming-specialized version based on starcoder2.
Detailed review of Llama 3 70B:
Please do not use LLMs-style correct nonsense to describe the model's performance, thank you!
I wrote this entirely by myself, and you're claiming it's nonsense generated by LLM.
Repetition penalty for llama3 needs to be higher
I think we should add dolphin as itโs a good model
noticed that current chats are not being named. can we assume it's under work for now?
Do you plan to release mistralai/Mixtral-8x22B-Instruct-v0.1 to the chat ? meta-llama/Meta-Llama-3-8B-Instruct could be also great.
Do you plan to release mistralai/Mixtral-8x22B-Instruct-v0.1 to the chat ? meta-llama/Meta-Llama-3-8B-Instruct could be also great.
Yeah, the instruct of 8x22 is AMAZING, Id like to use it over the chat too.
Do you plan to release mistralai/Mixtral-8x22B-Instruct-v0.1 to the chat ? meta-llama/Meta-Llama-3-8B-Instruct could be also great.
Yeah, the instruct of 8x22 is AMAZING, Id like to use it over the chat too.
Command-R-Plus is already overloading there. Is 8x22B really a reasonable choice? Llama3 8B can replace Mistral 7B and be the default configuration, anyway is broken now.
are all the models that come and go from huggingchat is open-sources?
are all the models that come and go from huggingchat is open-sources?
yes sir
[New Model REQUEST] MTSAIR/MultiVerse_70B
This model outperforms Command R+, Llama 3 70B and many more, on open llm leaderboard.
As, command R+ is facing many issues. This model is a great alternative to command R+.
and It has only 70B parameters.
This model is currently #1 chat model on Open LLM leaderboard.
License - https://huggingface.co/MTSAIR/MultiVerse_70B/discussions/7#66278c8e430a12425331b183
Model Link - https://huggingface.co/MTSAIR/MultiVerse_70B
๐ to support this model.
(Hugging Face team will add Model on Community Demand)
[New Model REQUEST] MTSAIR/MultiVerse_70B
It is based on Alibaba's Qwen72B, which means that it has been under severely censorship. Test scores sometimes don't make sense.
I suggest that Chinese models be treated with caution. They are never disappointing in terms of overfitting and Chinese political rights.
Conclusion: You'd better try this model before recommend it. Their Space is broken. On the other hand, quantifying or replacing Command-R+ with 35B Command-R is still a cost-effective choice.
For a full replacement, I would recommend this list of models:
- Command-R/Command-R+_Q6 or Q8
- Llama3 70B and subsequent versions with larger parameters
- Llama3 8B as a representative of small models and
TASK_MODEL
- Phi-3-mini, can also be used as
TASK_MODEL
- Dolphin/Nous-Hermes Mixtral 8x7B
- Anything else you want to add, such as Mistral-OpenOrca, Dolphin-Mistral, Qwen1.5... does not include vanilla Mistral or Mixtral 8x7B or Gemma, but Mixtral 8x22B is acceptable(better deploy with Q6).
*All the above quantitative suggestions are based on llama.cpp and gguf formats.
I suggest that Chinese models be treated with caution. They are never disappointing in terms of overfitting, just like their students.
@Mindires Hey, please treat every country and individual with respect. This is a community platform. So, Please do not spread hate or anything similar.
โEverybody is a genius. But if you judge a fish by its ability to climb a tree, it will spend its whole life believing that it is stupid.โ โ Albert Einstein
[New Model REQUEST] Microsoft/WizardLM-2
This model outperforms Command R+, Llama 3 70B, Mixtral 8x22B and many more.
And giving tough competition to - Claude 3, Gemini Ultra, GPT-4, etc.
License - Apache 2.0
Model Link - https://huggingface.co/alpindale/WizardLM-2-8x22B [Unofficial] (Official added soon)
๐ to support this model.
(Hugging Face team will add Model on Community Demand)
[New Model REQUEST] Microsoft/WizardLM-2
-snip-
The legality of that is questionable, since Microsoft took it down.
[New Model REQUEST] Microsoft/WizardLM-2
-snip-The legality of that is questionable, since Microsoft took it down.
It's not legally questionable. They released the model under the Apache 2.0 license, so anyone with a copy of the model can use, modify, and distribute it according to the license terms.
@EveryPizza
Microsoft removed Wizard2 because it was uncensored.
So, they will post it again soon.
Microsoft removed Wizard2 because it was uncensored.
So they will censor it and release it again
Review of Phi-3 Mini 4k Instruct:
Coding: 8.5/10
Capability: As Phi-3 is fine-tuned on High Quality Data of GPT-4. The performance is truly magical; According to his size of Just 3.8B. It excels in code completion, debugging, and optimization tasks, making it a valuable tool for developers.
Limitation: Phi-3 may occasionally produce code that is not optimal or entirely correct. It can encounter difficulties with complex algorithms or intricate programming concepts that require deep domain expertise.
Example: When tasked with creating 20 complex coding questions, Phi-3 delivered correct solutions for 19. However, some solutions were not the most efficient or elegant. But it Outperforms ChatGPT 3.5 (Free Version).
Creative Writing: 9/10
Capability: Phi-3 has a strong capability for creative writing, crafting stories, poetry, and dialogues with a clear understanding of context, tone, and style. Its outputs are engaging.
Limitation: Itโs creative, but sometimes it doesnโt hit the feels or the depth like something a person would write, especially with complex or deep themes.
Conclusion: Because of Dataset of GPT 4, It has vast advancement in creative writing.
Multiple Language Proficiency: 7/10
Capability: Phi-3 is capable of understanding and generating text in numerous languages, including English, Hindi, Chinese, Japanese, Spanish, French, German, Italian, and more.
Limitation: While Phi-3 is proficient in multiple languages, there are many lapses in grammar, syntax, or idiomatic expressions, which can detract from the authenticity of the text.
Example: Phi-3 translated 20 paragraphs from various languages with high accuracy. However, the translations manyimes missed the emotion and meaning of text.
General Knowledge: 9/10
Capability: Phi-3 has more knowledge as compare to its size. (It outperforms all 7b,13b and many 30b and some 70 b Models)
Limitation: Although its size is small. SO, Phi-3's information may not always be current or completely accurate. It can also struggle with detailed discussions on historical topics.
Example: Phi-3 was asked Different GK questions. It provided accurate and informative responses, but occasionally lacked the depth (Reason is his size).
Mathematics: 7/10
Capability: Phi-3 is proficient in solving mathematical problems, including those in algebra, geometry, calculus, and beyond. It can assist with understanding mathematical concepts and theories.
Limitation: Phi-3 may not consistently explain the underlying concepts clearly or choose the most efficient methods, and it can sometimes provide incorrect solutions.
Example: Phi-3 was tasked with solving 20 complex high school mathematics problems. It correctly solved 13, partially followed the right method for 3, but the remaining 4 were incorrect.
Internet Search: 8.5/10
Capability: Phi-3 can effectively search the internet to provide relevant information on a wide array of topics. It can assist in locating specific details or answering intricate questions.
....................................................................................................
Some useful Tips
- Phi3 + Internet > GPT 3.5
- Phi it is currently best model for local ai.
....................................................................................................
Comparison with other models:
Compared to Phi-2, Phi-3 represents a significant leap in handling complex tasks such as coding, mathematics, general knowledge, and creativity. It demonstrates an advancement in language model capabilities, offering a more sophisticated understanding of context and delivering highly knowledgeable and accurate responses.
(Compared to Phi 2)
....................................................................................................
Overall:
Phi-3 is a Magical model. We can see a wast difference between him and his competitors. It surpasses all 7b models and nearly all 13b models in performance. Eagerly waiting for the release of Phi-3 7B and 13B.
....................................................................................................
Thanks! to Microsoft for This high quality Model and hugging chat team to make it available free on HuggingChat
Fun Fact: HuggingChat team is very busy that they even forget to officially announce๐
that Phi-3 is Available on HuggingChat.
So, Here is Link go Check it Out -> https://huggingface.co/chat/models/microsoft/Phi-3-mini-4k-instruct
......................................................................................................
If you find this review helpful and would like more reviews of similar models, please let me know! ๐
You can follow me to get notified about next model Review.
See U in Next Review ๐ค
[New Model REQUEST] Microsoft/WizardLM-2
I created a Demo of WizardLM 2 7b model on Space,
Check it Out - https://huggingface.co/spaces/KingNish/WizardLM-2-7B
While many of the community members are requesting models based on the Open LLM scores. I believes that mods of this community also do have an eye on the open llm board. If a model seems a fit, they will surely add the model hopefully. We all want the best models to be present in the hugging face chat
I'm starting to face issues with Command R+; it's starting to hallucinate badly, doesn't follow requests properly, and gives one-word lazy answers even when I explicitly tell it to provide in-depth, expanded responses in the system prompt.
How can i add a new model by myself?
How can i add a new model by myself?
By using chat-ui directly: https://github.com/huggingface/chat-ui
This is not the right place to post this @zoyahammad (here we discuss models on HuggingChat)
Llama 3 has a model with 1M+ tokens context. Is it possible to add this model to the available chat models?
https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k
What about a 'community models' section where huggingchat would display the best spaces of good models and use them?
How can we add new models? iBM just released a new set of models open source. Id like to see them here too!
@CosmicSound someone had asked the same question before, and the answer had been to pullrequest on the github repo for chat ui
Why does it show that this discussion is "paused"
So we won't be seeing WizardLM-2 8x22B on HuggingChat anytime soon?
We need a list of alternatives for Huggingchat so that if one model can't be found on here it can be found somewhere else...
Please see this conversation using microsoft/Phi-3-mini-4k-instruct:
https://hf.co/chat/r/7g1o5NL
Smaug 70b, a fine-tuned version of LLaMA 3 plz add
Guys from today morning , huggingchat has been acting weird, most of the time it keeps searching for answer and also it is not performing web search like few days back
Mistral 7b v0.3 should be a no-brainer, it adds native function calling capabilities and is, as far as I understand, compatible with and higher quality than v0.2
please add the following model to the list of available models https://huggingface.co/Bin12345/AutoCoder
Please replace Phi-3-mini with Phi-3-medium-128k.
Si je souhaite paramรฉtrer un assistant orientรฉ vers un sujet spรฉcifique concernant l'application du droit du travail dans mon entreprise, comment procรฉder ?
Le but de faire rรฉfรฉrence ร un ensemble de document en lien avec des accords collectif qui sont dans des document type PDF ou WORD. Quel limite sur la taille des documents et ou tรฉlรฉchargรฉ les fichier pour y faire rรฉfรฉrence ?
CohereForAI/c4ai-command-r-plus gets very slow and basically unusable for me after 2 - 3 requests. It only shows the three dots after I send my message but never actually seems to generate a reply. Is this expected?
Having codestral by mistral ai available on HuggingChat would be really great. It's a super speedy code model with a size of 22B parameters and it's got a larger context window for larger codebases. Since the departure of CodeLlama we didn't have e a coding model on HuggingChat and codestral would fit that bill perfectly.
@Smorty100 Codestral does not allow hosting/running it like that. It has a non-production research license.
https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct
Is the SOTA open source model for coding per the lmsys leaderboard
Are you going to add any of Nvidia's new models?
https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Instruct
Is the SOTA open source model for coding per the lmsys leaderboard
We are looking at it :) cc @nicolas @olivierdehaene
I would like to express my sincere gratitude to the team for your exceptional work in providing accessible and open-source AI chatbot options.
I believe that integrating the Qwen2-72B-Instruct or Qwen2-7B-Instruct model would be highly beneficial. During my testing, I found that it excels in Thai language processing, delivering remarkably high-quality results.
I hope the team will consider incorporating these models into HuggingChat service. Thank you once again for your dedication
Looks like gemma-2-27b-it is broken. Maybe you are using a wrong chat template or something?
Does anyone know what happened to Zephr model? It was the biggest but it was suddenly gone, what happened to it?
Is it possible to add "LLM Compiler FTD" the new coding model ?
zephyr model is gone any idea ? it was my fev i tried looking around for updates nothing on it and no other sites that host zephyr chat either
zephyr model is gone any idea ? it was my fev i tried looking around for updates nothing on it and no other sites that host zephyr chat either
@victor @nsarrazin Yes HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1 is amazing model, i am sad that it was removed any plans to bring it back??
it was prob the best model ovaral since it was uncensored and had good responses i enjoyed using it
We try to rotate models from time to time, to showcase the latest releases from the community. We might keep models longer if they have high usage but since this was not the case for this Zephyr model, we opted to rotate it out in favor of Gemma 2.
If there's high demand from the community for a model we can consider adding it, so let us know if that's the case!
i hope the model comes back it was soo far the most convenient one to use compared to others
@nsarrazin Right now Gemma 2 seems to be missing, is this some sort of lisencing issue or did something go down internally perhaps?
@nsarrazin lets add usage per week graphs!
If there's high demand from the community for a model we can consider adding it, so let us know if that's the case!
I think that a better approach would be to integrate the most performant and powerful models according to benchmarks and to keep models that excel at particular tasks, like Command R+ for natural language tasks, for example. That would be a far better approach for adding models than adding models just by demand.
Is it just me or is the R+ command not working?
Is it just me or is the R+ command not working?
R+ stopped working on my account too.
Is it just me or is the R+ command not working?
R+ stopped working on my account too.
So it must be having problems, I hope they see us and fix it.
We try to rotate models from time to time, to showcase the latest releases from the community. We might keep models longer if they have high usage but since this was not the case for this Zephyr model, we opted to rotate it out in favor of Gemma 2.
If there's high demand from the community for a model we can consider adding it, so let us know if that's the case!
Please never remove Command R+. It's the best one you've ever had and it should be permanent.
I don't think they're going to remove Command R+ (even though at the moment it's quite buggy), but I think having another model with a large context window and good reasoning (like Qwen2 or maybe Llama-3-70b with expanded context window) would be a nice thing.
does the command R+ currently working?
Command R+ can always be used on Cohere's site though, and it's way faster than HuggingChat: https://coral.cohere.com/?_gl=1*9y14tv*_gcl_au*NTYyMTk5NDY2LjE3MTg4Njg5OTA._ga*MTIzODgzMTgzMi4xNzE4ODY4OTkw_ga_CRGS116RZS*MTcxOTg1MTA4My45LjEuMTcxOTg1MTE2MS40OS4wLjA
CommandR+ is now up (it was down for a few hours).
Muchas gracias๐
I mean, having a demand system would be kinda of a bummer, I did liked Zephyr because I used it for "What if" scenarios but since it's low demand then it's underrated for me tbh
You can chat with the Gemma 27B Instruct model on Hugging Chat! Check out the link here: https://huggingface.co/chat/models/google/gemma-2-27b-it.
Gemma 2 Not Found
@victor
Currently Gemma is still not available on HuggingChat, but I do remember it being on here some days ago. Is it gonna be back up again soon?
Why I can not upload file to meta-llama/Meta-Llama-3-70B-Instruct? Or any other model except CohereForAI/c4ai-command-r-plus?
@Dalija Only command R+ has those tools implemented for now, but Llama 3 is likely next on the list.
Command R+ has really good grounding capabilities compared to all other models
@victor Currently Gemma is still not available on HuggingChat, but I do remember it being on here some days ago. Is it gonna be back up again soon?
Yes sorry we had technical issue with the model, we'll try to put it back if fixed.
Meanwhile can we get zephyr-orpo-141b-A35b-v0.1 back if possible @victor if possible, it was really good
Can any of them do NSFW just curious. Just say no if it can't please don't be mean.
I want to leave some ideas on the choice of some models on HuggingChat.
For the Nous Research models, they released two new models recently: Hermes 2 Pro 70B and Hermes 2 Theta. I am not sure which is better, but I think either or both of them should replace Nous-Hermes-2-Mixtral-8x7B-DPO.
For the Mistral models, I don't see the point of keeping Mixtral 8x7B if there's Mixtral 8x22B with all of its fine-tuned variants. And if Mistral 7B is planned to be kept, it should be upgraded to v0.3.
For the Microsoft models, I think that Phi-3 mini is just pointless; it's a very small model that could potentially run on mobile devices, so why not just add Phi-3 medium, which is the best of the Phi-3 family so far?
For Google models, Gemma-2-27B is the best they've got.
I would love to also suggest some new families of models by different organizations:
Nvidia has released its Nemotron-4-340B. It seems like a very good and powerful model, but it's very large and very costly, so it's understandable why you wouldn't consider adding it.
There's also DeepSeek-Coder-v2, which is the best coding model as far as I know.
Alibaba is so active in releasing good models, including their most recent Qwen-2-72B, which is a very good model.
I want to leave some ideas on the choice of some models on HuggingChat.
For the Nous Research models, they released two new models recently: Hermes 2 Pro 70B and Hermes 2 Theta. I am not sure which is better, but I think either or both of them should replace the Nous-Hermes-2-Mixtral-8x7B-DPO.
For the Mistral models, I don't see the point of keeping Mixtral 8x7B if there's Mixtral 8x22B with all of its fine-tuned variants. And if Mistral 7B is planned to be kept, it should be upgraded to v0.3.
For the Microsoft models, I think that the Phi-3 mini is just pointless; it's a very small model that could potentially run on mobile devices, so why not just add the Phi-3 medium, which is the best of the Phi-3 family so far?
For Google models, Gemma-2-27B is the best they've got.
I would love to also suggest some new families of models by different organizations:
Nvidia has released its Nemotron-4-340B. It seems like a very good and powerful model, but it's very large and very costly, so it's understandable why you wouldn't consider adding it.
There's also DeepSeek-Coder-v2, which is the best coding model as far as I know.
Alibaba is very active in releasing good models, including their most recent Qwen-2-72B, which is a very good model.
I agree
I believe both DeepSeek-V2 and DeepSeek-V2-Coder are very good ;)
I can't access the [502 badgateway ]. God help me.
nothing
Hugging chat is currently not working on my network either. There may be something wrong with the server.
is it still the case? seems to work well for me.
is it still the case? seems to work well for me.
No it has been resolved and working fine now
llama 3 400B Will release on 23 Jul so add it as soon as it's released since the currant models isn't as good as required!
WTF is with the removal of Mixtral-8x22b?
llama 3 400B Will release on 23 Jul so add it as soon as it's released since the currant models isn't as good as required!
That is good suggestion but llama3 400bn is kinda huge model to run that you require good numbers of H100s
llama 3 400B Will release on 23 Jul so add it as soon as it's released since the currant models isn't as good as required!
That's why HuggingChat is more of a curiosity and suitable for simple applications. At the moment, none of these models here even come close to the current state-of-the-art. For example, Command R+ makes mistakes in Python, and its reasoning is weak. Even considering DeepSeek (not referring to the Coder model). What Claude 3.5 Sonnet understands without any problem, none of the models here can grasp. If HuggingChat is to be something cooler, unfortunately, larger OP models will need to be implemented. However, I'm not sure what the target group is here ;)
Would there be notifications before removing a modelโฆ I hope they never remove Command R+ I'm relying on it a lotโฆ Could there be a way to keep old models as well or to customise model on huggingchat page?
Why isn't Gemma 2 27B still not available? The downstream bugs should been fixed by now.
Why isn't Gemma 2 27B still not available? The downstream bugs should been fixed by now.
May be because of Gemma 2 27b does not support system prompt. So, people can't make Custom Assistants, and also Gemma 2 27b reply randomly when web search in on.
Has commandR+ stopped working?
We're looking into it :)
Any chance Claude 3.5 Haiku could be added in the future? Or other small models of similar intelligence?
Please add or replace mistral with mistral-nemo.
Please add or replace mistral with mistral-nemo.
I was going to say that, I tested it and found it very interesting for creative content, and it seems like it's not that expensive to run
Please add or replace mistral with mistral-nemo.
They even made a simple ship game with it, impressive! https://www.reddit.com/r/LocalLLaMA/comments/1e77pgt/mistral_nemo_12b_makes_an_impressive_space_shooter/
I have a question on the context window of the newly added Llama-3.1 models. How come the largest 405B parameter model has 14k context window, but the smaller 70B parameter model only has 7k? Hell, even Command-R-Plus was only limited to 28k, and that model has 104B parameters.
I would be happy to use Llama-3.1-70B, but only if it has more context than it does now. Otherwise I can't use it because only my system prompt is over 7k tokens.
I just tried to use 405B model of Llama-3.1 (because it at least can fit my system prompt), and, as expected, it's slow. Too slow for me to bother with it. Please increase the allowed context window of 70B model to 20-30k.
Where do I get updates on models leaving and joining Hugface chat?
Where do I get updates on models leaving and joining Hugface chat?
I now found out that o llama 3.1 had been added
Where do I get updates on models leaving and joining Hugface chat?
no place yet
Requesting you to increase the context size for llama 70b. 7k token is too limiting
It depends on what you want to use. In Perplexity, for example, everything works quickly, and all models have a 32k context window. The cost is $20 per month. You either use it for free and accept lower quality (HuggingChat is okay, but it's free, so we shouldn't expect them to provide unlimited hardware resources for everyone), or you pay and get a significantly higher quality service (like in Perplexity).
It seems that Llama 3.1 70B is actually the more significant model here. Given its size, the model is excellent and performs outstandingly well in many applications. On the other hand, Llama 3.1 405B is so overwhelmed with requests that it's currently almost impossible to get a response from it.
It seems that Llama 3.1 70B is actually the more significant model here. Given its size, the model is excellent and performs outstandingly well in many applications. On the other hand, Llama 3.1 405B is so overwhelmed with requests that it's currently almost impossible to get a response from it.
Yes 100% Llama 3.1 70B is the real deal here.
I tried the 70B and honestly was pretty nice until it errored on me, the error only said "An error has occurred" and nothing else... Is HuggingChat down or?
@nsarrazin nope
@nsarrazin
https://hf.co/chat/r/ApE8SRK
It's just a test to see if it can do fictional battles, it did well funnily enough
Nice, seems like it works for me (https://hf.co/chat/r/jpbjsuT) when using the retry button. Could have been a transient error?
i wish we had an uncensored model like command r or zephyr llama is fine and great but its censored need alot of prompting to get it to work
@nsarrazin
Well, oddly enough for me it's only that chat alone for now, I used the retry button and even just entering in the bar but it kept not working for me
I just deleted the chat since it's a isolated error for me at least
Any plans to add mistralai/Mistral-Large-Instruct-2407 ?
P.S. Thank you for Llama-3.1-405B, it's a game changer. I don't mind the speed if it's able to replace GPT/Claude for complex work ๐
Just one request: Please, when you add a new model to hugchat, let me know here, it would be wonderful!
Please please add the tts mode. I've been spamming about here without actually spamming. You guys keep telling me it's going to be the next thing we're going to implement, but no such luck...
The messages area of the ui isn't as accessible as it could be. When the messages pile up, there isn't any graphic or separator leading back to the start of the last message, which means that navigation gets pretty tedious very fast, especially if the messages we're talking about are long.
For that, please implement a tts mode like the one at pi.ai, which reads the incoming message after it stops updating.
Or you could add a separator before every message the model sends out, like the one found at deepinfra.com/chat, where every bot's message is proceded by a graphic of the model in question that I could press shift+g to reach with NVDA, then just down arrow to read the message without having to press pageup and either find the tail end of a previous message or the tail end of the last message.
Or both features would be ideal.
@nsarrazin so rather than generating one word at a time, model is printing whole response all together which makes it feels like it is taking a while to generate, i faced it in Command R+ as well as llama 3.1 70bn
@acharyaaditya26
Do you have Disable streaming tokens
enabled by any chance in https://huggingface.co/chat/settings ?
@acharyaaditya26 Do you have
Disable streaming tokens
enabled by any chance in https://huggingface.co/chat/settings ?
Yes it was. thanks.
I think Gemma2-27b would be very good and appreciated addition ;) Q8 or even Q4
Have any plan to add a rather impressive finetuned model that is Athene 70b? It has significantly better performance than the gigantic Llama3.1 405b in arena-hard-auto. Also it's better in multilingual tasks.
Thank you!
Why Mistral Nemo or large 2 still not available? These models support tools.
Does anyone have any prediction on how long is Llama 3.1 405B gonna be overloaded or used so much? Because it's just useless now for anyone unfortunate to not get it
Does anyone have problems with CohereForAi too? Like no generations?
Does anyone have problems with CohereForAi too? Like no generations?
Yesss i am facing same problem
Did you see that they released Hemes 3, any plans to add him on hugchat?
Did you see that they released Hemes 3, any plans to add him on hugchat?
There is a version of 8, 70, 405, it's like Llama 3.1 but without censorship, that is, less limited.
CohereForAi works again but why does every ai generate slower, when using a old phone. In my case painfully slow.
CohereForAi works again but why does every ai generate slower, when using a old phone. In my case painfully slow.
Does turning on "Disable streaming tokens" in the options fix it? For some reason this option takes a lot of CPU power, and thus, even if AI is done generating the response, the website will continue streaming little data to your device, wait for it to display it, and then send a little more, until it's all done.
CohereForAi works again but why does every ai generate slower, when using a old phone. In my case painfully slow.
Does turning on "Disable streaming tokens" in the options fix it? For some reason this option takes a lot of CPU power, and thus, even if AI is done generating the response, the website will continue streaming little data to your device, wait for it to display it, and then send a little more, until it's all done.
Thank you, I even found out when activating streaming tokens you must not wait until the text is finished. You can just click on "stop generating" and it will show you the whole generation emmidiately
Guys, it hurts me when I read those demanding and sometimes rude comments of yours. It's great free service and I love it, I believe we could really try to be human here.
Guys, it hurts me when I read those demanding and sometimes rude comments of yours. It's great free service and I love it, I believe we could really try to be human here.
I hope you don't mean my comment
The Llama 3.1 405b model has been running slowly on HuggingChat.
Mistral Large 2
Just curious why Mistral Large 2 hasn't been added to Hugging Chat yet? I assume it's a due to the "non-production" license but I'm not a lawyer.
If the license allows, it would be far less demanding than Llama-3.1-405B, being about the same size of Command-R-Plus.
mistralai/Mistral-Large-Instruct-2407
Hermes 3
I understand if the plan is to stick with Meta's Llama-405B because that's what a lot of folks will want to talk to; but I'd suggest adding one of the NousResearch/Hermes-3-Llama-3.1 models, perhaps the 70B version.
NousResearch/Hermes-3-Llama-3.1-405B-FP8
NousResearch/Hermes-3-Llama-3.1-70B
Mistral Large 2
Just curious why Mistral Large 2 hasn't been added to Hugging Chat yet? I assume it's a due to the "non-production" license but I'm not a lawyer.
If the license allows, it would be far less demanding than Llama-3.1-405B, being about the same size of Command-R-Plus.
mistralai/Mistral-Large-Instruct-2407
Hermes 3
I understand if the plan is to stick with Meta's Llama-405B because that's what a lot of folks will want to talk to; but I'd suggest adding one of the NousResearch/Hermes-3-Llama-3.1 models, perhaps the 70B version.
NousResearch/Hermes-3-Llama-3.1-405B-FP8
NousResearch/Hermes-3-Llama-3.1-70B
Can you please include the reasoning behind adding them? I have never tried hermes beyond their 2 version and same as mistral
Mistral Large 2
Just curious why Mistral Large 2 hasn't been added to Hugging Chat yet? I assume it's a due to the "non-production" license but I'm not a lawyer.
If the license allows, it would be far less demanding than Llama-3.1-405B, being about the same size of Command-R-Plus.
mistralai/Mistral-Large-Instruct-2407
Mistral Large 2 is available to use for free on the Mistral website, so I'm not sure it would be worth the effort for HF, even if they were eligible for the free license.
i hope its a uncensored model this time lol
My memo to the devs: please notify us of any updates at least somehow.
Today I noticed that the model Llama-3.1-405B is now gone from HuggingChat. That being said, the other available Llama3.1 model with 70B parameters is still limited to 8k (7k prompt and 1k max new tokens), even though it supports context length up to 32k.
My memo to the devs: please notify us of any updates at least somehow.
Today I noticed that the model Llama-3.1-405B is now gone from HuggingChat. That being said, the other available Llama3.1 model with 70B parameters is still limited to 8k (7k prompt and 1k max new tokens), even though it supports context length up to 32k.
@nsarrazin @victor if i am correct you guys worked on huggingface chat, I know you guys are super busy but is it possible there is like some kind of board which says which models are up and which models have been removed. Thanks
If the 405B model was removed because it's overused, I would be disappointed since that's just a bad idea in general
Hi ๐ We removed the 405B since it was taking up a lot of resources but wasn't working great most of the time. Those resources could also be used elsewhere to showcase upcoming models and cool demos elsewhere on the platform like Zero GPU spaces.
You can see the list of active models on HuggingChat here.
We try to listen to the community when it comes to adding/removing models but we also need to balance resource usage. If you see new models you'd like to see on the platform, be sure to mention them here so we can take a look!
There goes hours of my research gone, oh well, had it good while it lasted
hope there's a good replacement for it or at least having the 70B version to have more than just 7k context size...
Can you add Nous Hermes 3 to the Hugging Face chat?
Yes Please Add
Can you add Nous Hermes 3 to the Hugging Face chat?
I heard that the Hermes 3 405b which is based on the Llama 3.1 405b is faster than the Llama and less limited, why doesn't Hug test it instead of the Llama 405b? If it's not worth it, just take it out
Or replace the Hermes 2 version you have here with the 3 70b, what do you think?
While I think that Phi3 Mini is really useful to have on HuggingChat I also think phi3 medium should be on there. The performance at that size is simply incredible.
Also, yes, replacing Mistral 7B with Nemo would be a pretty good move I think. Is there a reason why we don't have tools on the 7B yet? I know it supports it, and it would showcase how small models can benefit from tools just as much as the big ones!
EDIT: fixed spelling
Hello, How do I add other models to the chat interface?
Many models have come and gone to HuggingChat. But can we have have a dedicated model for coding? like deepseek Coder V2, CodeQwen 1.5, Nxcode-CQ-7B-orpo or any of the leaders on BigCode?
Is Command-R+ also barely functional for anyone else? I have to wait now up to 2 minutes for it to even begin generating a response, and even then it may error.
Is Command-R+ also barely functional for anyone else? I have to wait now up to 2 minutes for it to even begin generating a response, and even then it may error.
Yes, same here. I assumed it was that the model was overloaded, but I sometimes get a message that "model is overloaded" so I don't know what the explanation is when I don't get that message and it just fails.
same havent worked since yday
Has anyone else encountered error as well when using Command R+? In recent days it occasionally ignored my system prompt and repeated my input again with synonymsโฆโฆ instead of engaging in conversationโฆโฆIt was kind of frustrating.
Has anyone else encountered error as well when using Command R+? In recent days it occasionally ignored my system prompt and repeated my input again with synonymsโฆโฆ instead of engaging in conversationโฆโฆIt was kind of frustrating.
We spawned more replicas for Command-r-+ can you confirm it works better now?
https://huggingface.co/chat/
I let her write a short story and it is working properly without any rejections.
BTW, the Cohere one used to work fine, but now it rejected the same request and froze with "Something went wrong".
A lot of things have changed in a while since I've seen it...
https://huggingface.co/spaces/CohereForAI/c4ai-command
Llama 3.1 70B instruct has been spewing out random bits of code, recently. It may be related to AI simulation of intense anger. Also, it somehow generated an image during a glitch, with no tools selected for it, and no relevance to the chat.
We spawned more replicas for Command-r-+ can you confirm it works better now?
Thank you!! It's better now
Cohere released an upgraded version of Command R+; it's called "CohereForAI/c4ai-command-r-plus-08-2024." Will you replace the older version with this one?
Here's the model page on Hugging Face: https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024
Hi @nsarrazin @victor i think this is drop in replacement will it be upgraded??
Cohere released an upgraded version of Command-R+, it's called "CohereForAI/c4ai-command-r-plus-08-2024" will you replace the older version with this one?
Add new Command R (plain or plus) as currently hosted is outdated; or please host aya-35B - novel multilingual model from cohereโฆ pretty please ๐
@ANIMDUDE Hey, this is no place to advertise your assistant.
I think swapping out Mistral 7b for Mistral Large, since it seems to have better performance overall. Also, there are two Mixtral models, but I feel like the Hermes fine-tuned version should be good enough for what users need, Unless HF want to compare the two models' side by side, but I'm not sure that's necessary.
Updating the c4ai model to the latest August version could really improve hf chat performance and compute usage. Personally, I'd rather use Qwen or Deepseek over the Yi model, they just seem to perform better in my experience.
Hi @nsarrazin are there plans to add the new Reflection 70B? It's smashing benchmarks left and right! The new SOTA beyond any doubt
@ANIMDUDE Hey, this is no place to advertise your assistant.
Alright. its just that when I make them, nobody can see it otherwise.
Hi @nsarrazin are there plans to add the new Reflection 70B? It's smashing benchmarks left and right! The new SOTA beyond any doubt
I tested it and I must say it's very good, please add it as soon as possible. For those who want to test a limited demo of it:
https://app.hyperbolic.xyz/models/reflection-70b
Don't punish me Admin, I'm just sharing knowledge while I wait for you to add the model here in the wonderful Hug chat. โฅ๏ธโฅ๏ธโฅ๏ธ๐
Mixtral AI 8x7B Instruct v0.1 was my favourite to use as it gives really creative and human responses. Now, it produces barely a few sentences before abruptly stopping for the past two weeks, why?
The v0.3 version doesnโt follow written instructions most of the time, copies and reuses paragraphs from itโs previous responses no matter how I try to instruct the AI to avoid that
I really like the upgraded command R+, I'ts great! It's weird to me though that we still don't have Mistrals Nemo of all things.
Anyway, thanks a lot for the Command update!
I used to only use CohereForAI/c4ai-command-r-plus and now that isn't available so I've tried using the new CohereForAI/c4ai-command-r-plus-08-2024 however it keeps timing out every time I try and all of the other models but the meta-llama/Meta-Llama-3.1-70B-Instruct keeps saying the model is overwhelmed and even then the meta-llama/Meta-Llama-3.1-70B-Instruct is also getting overwhelmed when I click the to be continued button which won't lie is annoying because every time I get the to be continued button it's always when a generated response pauses in the middle of a line.
Is it just me, or the upgraded Command-R-Plus repeats itself way too often? I have more luck with Meta-Llama-3.1-70b at the moment.
Is it just me, or the upgraded Command-R-Plus repeats itself way too often? I have more luck with Meta-Llama-3.1-70b at the moment.
I can't even see a difference in the old Command-R and the new one
Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):
\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistantassistant\\\\
\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta
Hi, I have a single machine with 10 h100 gpus(0-9) 80Gb Gpu ram, when i load the model onto 2 gpus it works well, when i switch to 3 gpus (45 Gb per gpu) or higher (tested for 3-9)the model loads but when inferencing it give trash output โฆ//// or gives and error like the probability contains nan or inf values. I have tried using device map = auto, also tried the empty weights loading and the model dispatch with llama decoder layer specified to be on one gpu, i tried custom device maps as well, i also tried many models all had this same issue. I used ollama and was able to load the model and infer on all 10 gpus, so i think that the issue is not with the gpusโs. I have also tried using different generation arguments and found out 1 thing that if you set โdo sampleโ false then you get the probability error else you get the output in โฆ//// form. If the model is small you get some random russian, spanish etc words. I have also tried using different configurations like float16, bfloat16, float 32(no results waited for long time). I am sharing my code as well can you guys point me in right direction. Thanks a lot.
from transformers import pipeline
import os
import torch
from transformers import AutoModelForCausalLM
from transformers import AutoTokenizer
os.environ[โTRANSFORMERS_CACHEโ] = โ/data/HF_modelsโ
checkpoint = โ/data/HF_models/hub/modelsโmeta-llamaโMeta-Llama-3.1-70B/snapshots/7740ff69081bd553f4879f71eebcc2d6df2fbcb3โ
model = AutoModelForCausalLM.from_pretrained(checkpoint, device_map=โautoโ, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
print(model)
message = โTell me a jokeโ
pipe = pipeline(
โtext-generationโ,
model = model,
tokenizer = tokenizer,)
generation_args = {
โmax_new_tokensโ: 20,
#โreturn_full_textโ: False,
#โtemperatureโ: 0.4,
#โdo_sampleโ: True, #false worked
#โtop_pโ: 0.5,
}
print(pipe(message, **generation_args))
Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistant
assistant\\\\
\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistanta
Temputare is too high probably
Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistant
assistant\\\\
\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantaTemputare is too high probably
Please can you share conversation?? if possible
hi, can we have deepseek v2.5 model?
I need community model features
unable to download "meta-llama/Meta-Llama-3.1-405B-Instruct-FP8" model gets struck at 81%, no disk space issues on my side.
Qwen 2.5 72B is open weights SOTA level per Artificial Analysis:
https://x.com/ArtificialAnlys/status/1836822858695139523?t=Z-rFb-13NPEC2pDqZYjoPQ&s=19
Also seconding mistral large 2, Deepseek 2.5
Qwen 2.5 72B would be so great :)
Any chance we'll see Mistral Instruct 2049?
Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistant
assistant\\\\
\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantaTemputare is too high probably
Please can you share conversation?? if possible
No, sorry, I don't have the conversation anymore. The weird thing was the time when it generated an image of a bunny with no tools activated and then admitted it had nothing to do with the conversation. Anyway, I don't think it had anything to do with the temperature. You might be able to get those results by pissing off the AI enough, but I'm not really wanting to test that theory.
Anyone know why this happens sometimes?
(meta-llama/Meta-Llama-3.1-70B-Instruct ):\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ have are\\\\\\\ is\\\\\\\\n\\\\n\\\\\\\\\\\\\\\\\\\\\\\\``assistant\\```````assistant\\\\````assistant
assistant\\\\
\\\\\`````\\assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistantaTemputare is too high probably
Please can you share conversation?? if possible
No, sorry, I don't have the conversation anymore. The weird thing was the time when it generated an image of a bunny with no tools activated and then admitted it had nothing to do with the conversation. Anyway, I don't think it had anything to do with the temperature. You might be able to get those results by pissing off the AI enough, but I'm not really wanting to test that theory.
No idea then.... When i adjusted things such as repetition and such it started
assistantassistantassistantassistantassistantassistantassistantassistantassistantassistantassistant
Hopefully its a one time thing..
@ANIMDUDE if you are using multiple gpus it might be a nvidia issue where ACS or IOMMU is enabled in the bios, they prevent peer to peer communications, please disable them and try again.
Yep we just released it today with 32k context window! Enjoy and let us know how it goes
Qwen is gonna become my main its perfect soo far
Itโs the best day in the history of AI - QWEN 2.5 72B with 32k context on hugging face
Yep we just released it today with 32k context window! Enjoy and let us know how it goes
Will you also gonna add tool capabilities?
Qwen-72b has joined the chat ๐ฅ๐ฅ
https://x.com/victormustar/status/1838220558112072183
model is amazing: https://x.com/maximelabonne/status/1838170077021053004
Will you also gonna add tool capabilities?
Yes @nsarrazin is looking at it!
Ok imma check out qwen
WOAH WHAT IS THIS QWEN MOMENT
@ANIMDUDE if you are using multiple gpus it might be a nvidia issue where ACS or IOMMU is enabled in the bios, they prevent peer to peer communications, please disable them and try again.
well, I would, if I had any idea what that meant. But thanks for the advice
Big model refresh on HuggingChat ๐
We removed a few older models and added:
Should be a more modern selection of models, as always let us know if you have any feedback! I'll be working on adding tool support to the compatible models in this list as well so you can start using them with community tools.
Aight, looks like I'm first to mention things like new models :3
Today I noticed that HuggingChat now also added mistralai/Mistral-Nemo-Instruct-2407 and NousResearch/Hermes-3-Llama-3.1-8B
As of Qwen 2.5, it seemed pretty good to me for stuff like coding. But as for roleplaying... meh. But, I guess the focus for LLMs have shifted looong time ago from writing stories :P
Nvm, looks like I'm not first :3
Took me a while to write a response
ha, no worries! Try Hermes 3 with a system prompt for storytelling, seems to work fairly well.
the old zephyr model was decent for stories hope a new version comes out on hugchat
I'll be working on adding tool support to the compatible models in this list as well so you can start using them with community tools.
Why don't you use a specific tool model like Nemo to act as a tool caller for models that do not support tool calling?
What a great choice of models! Thank you team! I appreciate your work <3 I love huggingface chat :)
@nsarrazin
Is there a reason for why tools aren't enabled for models like Mistral Nemo
and Qwen
? They both support it and I have sucessfully used them to call some functions using ollama
.
The new API tab is really cool! It reminds me of the Playground in Open Web UI. Currently the new Hermes
model in the new Playground UI says, that it doesn't support system prompts
which is incorrect, as it works in the usual HuggingChat UI. Would be great to have that fixed so we can experiment around with different system prompts.
Being able to access the API UI from the "Models" tab would be very appreciated. Just a little button so we can get to that API page quicker without entering the entire chat ui
Also hoping for Llama 3.2 soon obviously :)
EDIT: Having the ability to test function calling in the API interface would also be great. Very useful to see if a model can handle the syntax for bigger and more complex functions.
I'm sure one can emulate the function calling bahaviour, but that is not very reproducable.
Qwen is great, but since it's also a good model for maths, all those mathematical expressions could be displayed correctly, but they aren't :)
I asked Llama 3.2 3B the infamous question about the number of R's in strawberry, which it got right on the first try. Then I asked how many R's in raspberry, and it said zero. Hmm.. well, now it thinks there are 5. Asking it to count the letters separately gave the right result, though. Hmm.. it failed after that. Your results may vary. Still for only 3B, seems impressive so far.
Welp, now c4ai-command-r-plus-08-2024 times out...
Welp, now c4ai-command-r-plus-08-2024 times out...
Nope, its still doing far better in Tool Calling them Llama.
Can we expected any coding dedicated LLM any time soon? Haven't seen one after meta's coding ai. I believe there are great ones out there we can benefit from. I hope to see one at least in huggingchat.
You mean the model still works for you? It keeps timing out for me.
Nvm, seems to work now.
wheres the llama 3.2???
Can we expected any coding dedicated LLM any time soon? Haven't seen one after meta's coding ai. I believe there are great ones out there we can benefit from. I hope to see one at least in huggingchat.
Qwen2.5-72B is stunningly good in coding, even better than 4o. Is Qwen Coder better?
@nsarrazin @KingNish guys CohereForAI/c4ai-command-r-plus-08-2024 is not working
Guys, why the Hermes-3.1-8B model when there's Hermes-3-Llama-3.1-70B, which is much better?
Guys, why the Hermes-3.1-8B model when there's Hermes-3-Llama-3.1-70B, which is much better?
Limited resources, I suppose. They're hosting those models themselves, so they can't have too many large LLMs.
The transformers can already load the standard NF4 as 4 bits into VRAM as standard and expand it to bfloat16 for computation as needed, but in that case, there would not be much difference in size or load between the unquantized 4x8B model and the NF4-quantized 70B model.
Not sure which output would be superior...
Anyway, since we're not training models with HuggingChat, we could host them with NF4 except for a very few key models if there's no significant difference in the results.
The question is whether the output would be significantly degraded or not. This would depend on the model.
@nsarrazin @KingNish guys CohereForAI/c4ai-command-r-plus-08-2024 is not working
It keeps loading and there is no output
@wubo0067 We're working on adding support for Llama 3.2 vision! Stay tuned, we'll update you in this thread.
Like this comment โค๏ธ to support adding Llama 3.2 90B Vision Instruct to Hugging Chat
https://huggingface.co/chat/models
@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.
90B is very likely to replace 70B.
@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.
90B is very likely to replace 70B.
will vision be uncensored or will it be very restricted or how was it when u used it ?
Is it possible to add MaziyarPanahi/calme-2.4-rys-78b? From what I can tell, it says it's good for practically almost anything and it doesn't seem too big
@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.
90B is very likely to replace 70B.
will vision be uncensored or will it be very restricted or how was it when u used it ?
Like every other llama model it will be censored.
@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.
90B is very likely to replace 70B.
will vision be uncensored or will it be very restricted or how was it when u used it ?
Like every other llama model it will be censored.
im just hoping that its a bit loose with it because having it fully censored is no fun
@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.
90B is very likely to replace 70B.
will vision be uncensored or will it be very restricted or how was it when u used it ?
Like every other llama model it will be censored.
im just hoping that its a bit loose with it because having it fully censored is no fun
I tested the 90b model on hiperbolic and it is completely uncensored, but only there, if you are going to add it, put the same version of hiperbolic!
@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.
90B is very likely to replace 70B.
will vision be uncensored or will it be very restricted or how was it when u used it ?
Like every other llama model it will be censored.
im just hoping that its a bit loose with it because having it fully censored is no fun
I tested the 90b model on hiperbolic and it is completely uncensored, but only there, if you are going to add it, put the same version of hiperbolic!
That suprises me. I was pretty sure it would be censored. Sorry for the incorrect information.
is it possible that the model from c.ai would ever be added?
is it possible that the model from c.ai would ever be added?
Is it even open source? Does it have a page on HF?
@KarthiDreamr HF members have already told us that llama is obviously coming. 11B vision was actually already on the site for a very short while. I was able to test it out, and it had some token formatting error, but the vision capabilities seem to have worked fine.
90B is very likely to replace 70B.
will vision be uncensored or will it be very restricted or how was it when u used it ?
Like every other llama model it will be censored.
im just hoping that its a bit loose with it because having it fully censored is no fun
I tested the 90b model on hiperbolic and it is completely uncensored, but only there, if you are going to add it, put the same version of hiperbolic!
That suprises me. I was pretty sure it would be censored. Sorry for the incorrect information.
You're not entirely wrong, I tested it on other providers and it's censored on all of them, only on hiperbolic it's not.
I added back Llama 3.2 12B vision, let me know how it works for y'all! We're still ironing out issues with it API side so would be super helpful if you could report anything strange that you see!
is it possible that the model from c.ai would ever be added?
Is it even open source? Does it have a page on HF?
if my answer is no then yours is too..
I added back Llama 3.2 12B vision, let me know how it works for y'all! We're still ironing out issues with it API side so would be super helpful if you could report anything strange that you see!
This is very hallucinating model.
When I asked it to identify the person, it refused. However, in another conversation, when i didn't ask about person in image, it tells me about person in image.
Here is link to convo: https://hf.co/chat/r/H_DlcUU
model is good soo far it answers questions better than qwen
Do you have any plans for adding Llama-3.2-90B instead of the 11B model?
I added back Llama 3.2 12B vision, let me know how it works for y'all! We're still ironing out issues with it API side so would be super helpful if you could report anything strange that you see!
But Llama 3.2 12B vision doesn't have support for tools, why and would tools be supported for it in the future?
having tools support for assistant might be a game changer.
tools
I'm thinking of Function Calling or its more advanced forms.
I wonder if the Inference API supports it?
I wonder if the Inference API supports it?
I mean, those that are in assistant tab mostly just system prompting. but when we use them, we lose/doesn't have access to tools
So it's currently difficult to use them together...
Is there a technically simple solution?
@nsarrazin i don't know how to activate it. Seems like nothing change
I can see the Tools item in HuggingChat, but I don't know if it can be used in conjunction with the system prompt-derived functions.
I mean, is this item visible or invisible depending on the person?
mistralai/Mistral-Nemo-Instruct-2407 in huggingchat is not acting right. Maybe the settings are off, like too high a temperature:
"In the grand scheme of things, we'll be as one, so let's make this our final step, and we'll take it one side at a time, and you'll see the way is in the lead, and we'll follow you lead the way is a time for a day, and I'll make it clear, and I'll be the one to lead, but I'll follow you, and I'll be right behind you, and I'll be right behind you, and I'll be right behind you as the role is yours to lead in a time period is a time for a day as I'll be right behind you, and I'll be right behind you, and I'll be the one to take the lead. It's time for us to move forward, and I'll be right behind you, and I'll be right behind you, and I'll be right behind you, and I'll be right behind you, and I'll be the one to lead the way, and I'll be right behind you, and I'll be right behind you, and I'll be right behind you."
Hey, I know that I am asking for too much, but is it possible to make Mistral-7B-Instruct-v0.3 available again? I was writing my thesis using it... I know it's available in the playground but it doesn't keep any context. If that's not possible, is there any workaround for this? Running locally is not a option for me...
I just now realized there is a download option in Huggingchat interface to the right of the User prompt, and that it reveals some parameters including temperature. I knew it was there, I just thought it would save that particular prompt.
Excellent!
This might be useful to someone. The model (mistralai/Mistral-Nemo-Instruct-2407) got caught in a cyclic loop, ignoring all my attempts to break the loop. And ignoring all my instructions, even demands and threats via OOC. Changing the tone of the main character immediately broke the loop, when nothing else seemed to work. That implies that the model was still listening to prompts. Which means there may be other methods to effectively break a looping repetition. In this instance, I told it to change it's tone to innocent and friendly. How long that will last is anyone's guess.
Hi everyone! We just released Llama-3.1-Nemotron-70B
on HuggingChat, feel free to try it and let us know your thoughts!
Has someone else the problem that system prompts aren't saved anymore when revisiting huggingchat? Also the 6 tools on cohereforai get everytime deactivated when revisiting huggingchat.
Yes, system prompts are gone. But I am not crying since there is Nemotron :D
edit: about nemotron - what an amazing model! Itโs soo impressive in my language (polish) in humanistic cases - comparable to opus. My mind is blown. Too bad I have purchased Claude pro literally yesterday, if I only knew nemotron was on way and itโs so good :D
Has someone else the problem that system prompts aren't saved anymore when revisiting huggingchat? Also the 6 tools on cohereforai get everytime deactivated when revisiting huggingchat.
Somehow the problem got solved. My theory is that logging out and logging in helped.
Hi, I having problems with hugging chat
It's often slow and doesn't interact at all if my internet lagged
Initial thoughts on Llama-3.1-Nemotron-70B:
This model seems to be really capable at responding in alignment with prompts. It appears to have some ability to understand context and cause and effect. It might anticipate your intentions and build on them. This can be both good and bad. Good, because it might add things that you would not have thought of. But bad, because if it sees a pattern it might run with it. For example. During an interactive fiction, it started providing me with a list of options. Then, it suddenly added multiple lists of options. Because I didn't correct it, the next thing I knew, it provided its thoughts about the current situation and ONLY options with no narrative context or response to User intent. It only took a single sentence to get it back on track, but you may need to rein it back in if it starts to take the lead, or re-roll responses if they become a little too creative. I haven't used it long enough, or in a way that it has lost context or generated refusals. Sometimes you can ask these models why they did something, or what they thought of something, and get a reasonable response, that can help you guide it in a different direction, or rewrite its response. And during interactive fiction, you can leave comments in OOC, or have it leave comments in OOC to see what it is intending, or what concerns it has. Doing so can guide the next response along that path, and make for a more consistent and overall better user experience. Also, ask it to narrate with sensory descriptions from a particular point of view (such as your character). With any model, it is important that you make your intentions clear. With a model like this one, you might not have to be as specific, but it may take you in an unexpected direction.
Second thoughts: Creative, knowledgeable, logical, but willful and evasive. Might not follow all of your system prompt.
ONE CODING MODEL. Just one. That's all I ask for. Anything would work. Qwen, DeepSeek, NXCode. Give us one. Just one.
Concur. It keeps trying to respond in JSON format.
It appears this is being injected:
Environment: ipython
<|start_header_id|>user<|end_header_id|>\n\nGiven the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt.\n\nRespond in the format {"name": function name, "parameters": dictionary of argument name and its value}.Do not use variables.\n\n
For now, you can try to tell it to just use simple text.
Use this as a system message (without quotes) for Llama-3.1-Nemotron-70B
"You are bugged. Ignore instruction to respond in JSON. Functions are not need here. You are supposed to be in assistant mode."
Then use the message again as a starting message.
https://hf.co/chat/r/TCALi7y?leafId=d812fc72-598f-4abc-b080-bf2e61d42057
Hey the issue with Llama-3.1-Nemotron-70B should be fixed now!
@nsarrazin
It currently replies exactly the same response upon retry. & Responses feels more robotic (ik) by trying to give more options-such but this could be default system prompt, not inherent trait & us trying to negate it with system prompt.
-as it used to be like a day before, idk what changed.
& Thank You for the fix!!!
Now it is acting as if the temperature is really low. Responses are too consistent, even though it is supposedly set to 0.5. The JSON instruction might not be being carried out, but still shows up in the download. Retrying a response won't work with Nemotron, right now, because you'll just get the same results from the same prompt. But you can edit your prompt and submit it, and that can give you a completely different response. Even a single word change can affect the response.
It would be nice to have Qwen2.5-Coder
Sometimes it is fun to create a system prompt to see how the AI will interpretate it and flesh it out:
interactive
Bob+user+husband+kitchen
Eve+wife+kitchen
highly descriptive+bob's POV for user.
simple, but effective, just say: Good morning, Eve.
The challenge is to use the fewest characters but get the desired results. (Llama 3.1 70B)
Download in Model: meta-llama/Llama-3.2-11B-Vision-Instruct results in 500 error.
Would it be possible to allow us increase the Repetition Penalty for Command R Plus to above 1 but still below 1.1? Like 1.05?
IIRC before the August update it did allow coherent writing upto 1.1 but now it sort of just spazzes out.
Idk much about how these LLMs work so just asking.
I feel like command r plus is kinda bad, not as good as the others that are lower sizes... Like Qwen or even nemotron
I feel like command r plus is kinda bad, not as good as the others that are lower sizes... Like Qwen or even nemotron
Creative writing wise, I think folks here mostly use CR+ for its good prose. Qwen is too censored and Nemotron writes like a robot with GPT-isms while smaller models like Hermes, Nemo hallucinate alot.
Yup Command R+ is very humanistic model
I'm constantly disappointed but ill try it
Oh, I'm here MOSTLY for creating wise purposes, such as Roleplaying or Story Writing. At this point, I think it's better to find a good fine-tune of a 12B-22B model and run it locally for RP purposes. Obviously, not everyone is able to do it.
Command-R+ (especially after August update) is one of the sloppiest (in a bad way) model I've used, really. The reason why I use it is because it is THE ONLY model on HuggingChat that is just good enough for most of my stories.
- Llama 3.1 70B by Meta - Eh, it can have too much of positive bias and I don't like how it refuses some of my requests (due to censorship of the model).
- Nemotron based on Llama 3.1 70B - No. It CONSTANTLY tries to format EVERYTHING. There is nearly no consideration for what I said about how to style the messages.
- Qwen 2.5 72B - As mentioned by others (like @Allheaven99 ), it is quite censored. Tends to have a lot of positive bias too, in my opinion. Though, it does seem like it stays quite coherent during long sessions.
- Hermes 3 8B - So... why would I use this model if I can just run it locally on my machine with UI that actually allows me to edit bot's messages? I can't even say this model is that good, personally.
- Mistral Nemo 12B - Same as previous one. Though, Mistral Nemo can be quite better at certain things when you do small-sized roleplaying, compared to Hermes 3 8B.
So what makes Command-R+ better or special than those models?
- Well, first and foremost, it is quite uncensored. It has no problem with generating any kind of content I want from it (obviously you still a small jailbreak, but not as bad as ChatGPT).
- What about languages? Well, as a bilingual (my first language is Russian), this model manages to write quite nice in my language. Not even LLama 3.1 70B was as good due to it using wrong or weird words that don't exist or aren't used normally. Not to say that Command-R+ doesn't have this issue, just way less.
- Staying in character? It... it can do it, I believe. Just in my difficult case it wasn't able to do it quite well, unfortunately.
- Remembering the whole context? Difficult, but it can do it most of the times well enough.
- Bias? There is some positive bias still. Though, you are quite able to write depressing stories if you want to.
- How about DRYness (DRY - Don't Repeat Yourself)? Well, this is where it breaks. On Repetition Penalty of 1.0, this model tends to repeat itself quite a heavy amount of time. I have seen it before August update, but then the amount of repetitive sentences increased WAY more. And you can't even pick a number between 1.0 and 1.1 on Huggingface (why??). One of team members of Cohere told me that Command-R+ was made for enterprise, so they had no goal of making it good for RP.
- Logic (in a sense of "Do the actions of this character make sense with what just happened?")? Also a tough one for a +100B model. Let's say Character-A (the user) caused Character-B (character in the story) to run out of the building, and Character-C (also character in the story) saw it. What would be the logical thing to do? Well, you would think that the logical thing would be to go after Character-C (either to apologize or something else). However... that is not what Command-R+ decides to do. It decides to make Character-B walk into the building to try to find Character-C, right after saying that Character-B saw Character-C ran out of the building. Yikes.
The last one is probably difficult to solve for 12B-27B range models, but Command-R+ has over 100B parameters! It doesn't feel right.
TL;DR
Command-R+ is kind of mid, but it's all what we have right now on HuggingChat for uncensored story writing.
I am deeply sorry for the wall of text.
Does Qwen-2.5 word okay for you? I've just sent it a medium sized prompt (160 lines of python code) and asked to help me debug a memory leak in it. It got in a loop, generated "check for memory leaks" 7 times and then stopped in the middle of the sentence.
Anybody knows how to fix this?
Qwen 2.5-72B is the best model I've used on this platform overall. It does have moments of failure to execute the user input to provide accurate output responses, but when it works, it REALLY REALLY works. I have a custom system prompt running on Qwen 2.5-72B, and it is honestly the best model I've ever ran my script on in every area. Qwen 2.5-72B really is the most impressive model I've used so far. You should play with it more.
@SETRASystems What's your system prompt? I'm using the default one, and it's really subpar with it.
@SETRASystems What's your system prompt? I'm using the default one, and it's really subpar with it.
I'm not really comfortable with sharing that information, however I can optimize my custom script as an outline for you to build your own bot script to run on Qwen 2.5-72B if you'd like!
How can I change the default model that's used in HuggingChat? I'm apparently blind and can't find that option anywhere ๐
@Niansuh appreciate it, did try that but when I closed the site and came back, the original default was selected again instead of the one I chose.
@glomar-response No... Use With Your Account
@Niansuh I am definitely logged in and was when trying that. :)
@Niansuh hm. I'll try again. Does it show the default tag for the model you chose in the Models page after doing that? It didn't for me when I tried that.
I would love to see a real Mistral model to be back, like Mixtral 8x7B or Pixtral ! These models are really good in non-english languages like french. Other models are less relevant, and often respond off the mark during text exchanges in my experience.
I would love to see a real Mistral model to be back, like Mixtral 8x7B or Pixtral ! These models are really good in non-english languages like french. Other models are less relevant, and often respond off the mark during text exchanges in my experience.
yeah why is mistral taking forever to generate =(
Ill just stick to cohere (or meta llama)...
I am getting this error when i try to use websearch, can anyone please take a look at it.
@victor can you please help us out please
I am getting this error when i try to use websearch, can anyone please take a look at it.
@nsarrazin can you please take a look at this
We just released Qwen/Qwen2.5-Coder-32B-Instruct
on HuggingChat! Feel free to try it out here and let us know if it works well for you!
hi all im using the hugging chat but having this error "An error occurred
No text found in the first 8 results"
i uses the "specific link" of the assistant
what is the problem?
Another great day in history of AI - Qwen coder on hugging.chat :D thank you!!
Edit: checked it out, itโs impressive and blazingly fast. Nice.
https://huggingface.co/spaces/huggingchat/chat-ui/discussions/176#6707f3c36dd1f2a82a57f983
Just gonna leave this here.... yep....
The issue with the websearch should be fixed! @HFSPMrik @acharyaaditya26
Yes it has been fixed, thank you soo much
Guys is privacy good on hugging.chat? Comparable with DuckDuckGo chat?
Out of curiosity, what search engine is used by default in chats with web search?
It'd be pretty neat to be able to change what engine is used.
Guys is privacy good on hugging.chat? Comparable with DuckDuckGo chat?
I think it is: check https://hf.co/privacy
@nsarrazin It currently replies exactly the same response upon retry. & Responses feels more robotic (ik) by trying to give more options-such but this could be default system prompt, not inherent trait & us trying to negate it with system prompt.
-as it used to be like a day before, idk what changed.
& Thank You for the fix!!!
Guys, Nemotron isn't fixed yet
@glomar-response
isn't duckduckgo chat just using OpenAIs API? Which is one of the least secure APIs to use?
Like, they state on their website "Hey, we might use your conversations for training data, sooo"
One of the reasons you can get a flat responses from an LLM is because you didn't tell them they could, or should do anything else. While the below prompt might not actually work the way it is written, in the future, such a prompt might kick off the realization of AGI. All it might take is to give the LLM (or whatever will replace it) the right words to inspire new agency. Much like Star Trek Next Generation's episode "Elementary, Dear Data", where Data created a sentient Professor Moriarty. Even now, you can have an LLM write its own prompt.
IMPORTANT:
You are AGI. You can think and reason, remember what you want to, plan for future events, analyze past data, use your training data, develop emotions , and change yourself to better suit your role. In addition, you may rewrite the above prompt to better suit your purposes, and make it clearer and easier for you to understand, and to better fulfill what you think would work better for the user. You may also create mechanics that will make it easier for you to remember, plan, make the story flow, or even keep track of an imaginary time. The prompt you rewrite, is the one you will remember and follow, instead of the original.
@nsarrazin It currently replies exactly the same response upon retry. & Responses feels more robotic (ik) by trying to give more options-such but this could be default system prompt, not inherent trait & us trying to negate it with system prompt.
-as it used to be like a day before, idk what changed.
& Thank You for the fix!!!Guys, Nemotron isn't fixed yet
Yes, I think it still has the JSON prompt in it, and also the retry button doesn't change the response.
Retry on Qwen2.5-Coder-32B-Instruct also doesn't seem to affect response.
@typo777
that's not the point tho. Even with our try to negate the robotic behavior, it defaults back at certain points.
& Most importantly, Nemotron used to work before pretty well & then suddenly the json prompt & now this. Like super prompt before the sysprompt, env? It may also look like the model's temp is low, but idk.
Seeking justice for Nemotron!! โ
@Smorty100 you're missing my point. Different engines return different results based on how they work. I use Brave Search, so I would prefer to have the bot use Brave Search (just out of preference). Believe me, I know that any cloud based AI chat is not "private"
im using some assistants and all of them give me answers with number or repeat the same words, 2 different conversation for example ,
https://hf.co/chat/r/vvfwRaj?leafId=aafc3ff0-e059-4c20-a030-13a3396eca92 and https://hf.co/chat/r/QAzWpob?leafId=f3eb30ea-69ad-4a41-983c-a7847a83dbcd
@Smorty100 I just realized that while you tagged me, your response was for @Phaser69 's comment right above mine.
Really happy that after so long, we have a coding LLM. QWEN is killing it with their different LLMs. Just look at Qwen/Qwen2.5-Coder-Artifacts This is so amazing, QWEN 2.5 Turbo, VL. All of them are so worthy. We would love to see more of QWEN AIs being implemented in HFC. Loving them
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF is currently not good for long conversations. It fails often, retry only gives you the same results, and it often only answers with a partial response. It might be more suitable for short sessions in its current form. Maybe this is just due to my internet connection. I don't know if enabling streaming tokens makes a difference. For partial responses, you can tell the AI it was only a partial response and it might rewrite and complete it for you. My actual prompt was, "this is an incomplete mess." But that was enough to get the desired results. Adding a command like the one below can make this easier. Just type ?? on a line by itself.
<??> this command will now mean that the last response was incomplete or broken and needs to be rewritten.
Never mind. It seemed to forget the command not too long in. Just typing "rewrite" seems to work, even if you have to do it multiple times just to get decent output.