Have we really squeezed out the capacity of a compact chat model? Thrilled to see our latest open model, Starling-7B, ranks 13th among all models in Chatbot Arena! π As a 7B model, Starling surpasses larger open and proprietary models, including Claude-2, GPT-3.5-Turbo, Gemini Pro, Mixtral 8x7B and Llama2-70B, and is currently the best 7B chat model in Chatbot Arena! Try out the model on HF here: Nexusflow/Starling-LM-7B-beta
π Exciting breakthrough in LLM reliability! π§ NexusRaven-V2, our cutting-edge function-calling LLM, has set a new standard in minimizing AI hallucinations, surpassing GPT-4's performance in a recent third-party independent research benchmark.
π Zero Hallucinations: NexusRaven-V2 showcased remarkable accuracy with zero hallucinations in 840 tests, focusing on tool selection and usage β a significant leap over GPT-4 with 23 hallucinations.
π Enhanced Success Rates: It boasts a 9% higher success rate than GPT-4 in information-seeking applications requiring meticulous attention to detail and a 4% increase in adversarial scenarios that demand a deep understanding of tool documentation, even with vague tool and API argument names.