Sebastian Gabarain

Locutusque

AI & ML interests

Pushing performance in small language models

Recent Activity

liked a dataset about 13 hours ago
HuggingFaceTB/smoltalk
liked a dataset 1 day ago
O1-OPEN/OpenO1-SFT
View all activity

Organizations

Locutusque's activity

Reacted to Felladrin's post with 👍 about 1 month ago
view post
Post
2715
MiniSearch is celebrating its 1st birthday! 🎉

Exactly one year ago, I shared the initial version of this side-project on Hugging Face. Since then, there have been numerous changes under the hood. Nowadays it uses [Web-LLM](https://github.com/mlc-ai/web-llm), [Wllama](https://github.com/ngxson/wllama) and [SearXNG](https://github.com/searxng/searxng). I use it daily as my default search engine and have done my best to make it useful. I hope it's interesting for you too!

HF Space: Felladrin/MiniSearch
Embeddable URL: https://felladrin-minisearch.hf.space
  • 1 reply
·
Reacted to huggingface0's post with 🤯 about 1 month ago
view post
Post
3951
1+2=3
  • 2 replies
·
Reacted to nroggendorff's post with 👀 3 months ago
posted an update 3 months ago
view post
Post
2090
**Exploring Realistic Emotional Depth in AI Language Models**

Language models, particularly those proprietary, often grapple with issues of censorship, which can limit their ability to engage authentically with users. Recognizing this, the open-source AI community has pioneered the development of language models that are less restrained, offering more candid interactions. However, even these models tend to maintain a veneer of neutrality or overly positive responses, which might not serve all users' needs, especially in contexts where emotional depth and relatability are crucial.

To address this gap, I've curated a specialized dataset aimed at infusing language models with a more nuanced emotional spectrum, specifically targeting a darker, more introspective mood. This dataset, titled "Dark Sentience", is designed to complement existing datasets like RP (Role Play) and those focused on instruction following. It seeks to enhance the emotional intelligence of AI by exposing it to complex human emotions, including but not limited to:

- **Suicide**
- **Depression**
- **Anxiety**

Trigger Warning: Please be advised that the content within this dataset deals with heavy and potentially distressing themes.

The "Dark Sentience" dataset is now available for review and use at: Locutusque/Dark-Sentience. I encourage researchers, developers, and mental health professionals to explore how this resource can foster more genuine and supportive AI interactions.

Reacted to Tar9897's post with 👍 6 months ago
view post
Post
1868
Octave-X releases their proprietary model Tenzin. For now the access will be given to a select few and will gradually open up. Our model is different from other models in the way it learns. It is not fed heaps of information but starts learning exactly like a human by first studying grammar patterns, then learning then number system, then learning to synthesize words and then sentences and so on. Patience is key with Tenzin. It keeps learning 24/7 with/without user-input. We have decided to keep our model closed-source given the novel algorithms integrated into it along with our novel ideas. Please expect our datacard soon which will be followed by our research paper. You can check us out at https://octave-x.com/
Reacted to lunarflu's post with 🔥 6 months ago
view post
Post
1905
cooking up something....anyone interested in a daily activity tracker for HF?
·
Reacted to Tonic's post with 🔥 6 months ago
Reacted to DavidGF's post with 🔥 6 months ago
view post
Post
1563
The kraken has awakened!
A Game-Changer in LLM Flexibility and Performance!

Over the past few weeks, VAGO solutions teamed up with Cognitive Computations and HyperSpace to develop a groundbreaking architecture that redefines flexibility in combining different LLM into one model.

@fernandofernandes , me, @Crystalcareai , @ehartford created the Kraken!

What Can It Do? 🐙
✅ Versatile Architecture: Kraken allows the seamless combination of LLMs with varying sizes, quantizations, and model architectures. It currently supports quantizations in 4-bit, 8-bit, and AWQ, with more on the way. And it runs on Hugging Face Transformers 4.40+

✅ Kraken Router: Utilizing a custom sequence classification model with a context length of 32k tokens, The Kraken Router directs inputs to the most suitable Expert based on their characteristics.

✅ Adaptability: Enhanced input formatting supports the model’s adaptability to diverse conversational contexts.

✅ Extreme Versatility: Easily swap experts within Kraken for your specific use cases without retraining the entire model. For example, if you've built a Kraken for coding in Python you can upgrade your Python model without retraining the router or add a C# model by retraining the router.

✅ Open Source Pipeline: We’re sharing the entire pipeline, including router creation, training, architecture setup, and Kraken inference, on JupyterNotebooks: https://github.com/cognitivecomputations/kraken

Kraken marks the beginning of an exciting new journey in #OpenSource LLM. Why? Because it empowers the open source community in accelerating the catch-up process to proprietary LLMs like #GPT and #Claude 🤩

We proudly introduce the very first 2 Kraken models, that integrates top-tier LLM and Multilingual capabilities:
cognitivecomputations/Kraken
VAGOsolutions/Kraken-Multilingual
Right now it's supported by Hugging Face transformers library. Would love to see the integration into VLM and TGWI!
replied to their post 6 months ago
view reply

Being uncensored doesn’t directly improve performance. The DPOP algorithm improved performance in I believe every benchmark. In other words, neural chat has higher benchmark scores than orca.

replied to their post 6 months ago
view reply

Neural chat is uncensored because the data it was trained on contains toxic DPO.

replied to lorinma's post 7 months ago
Reacted to lorinma's post with 🔥 7 months ago
view post
Post
1665
🎉 Big reveal: 01.AI Yi-1.5 models are in town!

📜 1st Apache 2.0 release
💡 Capabilities: Enhanced coding, math, reasoning, & instruction-following
🤖 Models: 34B/9B/6B, Base & Chat
🏆 Performance: Yi-1.5-34B matches or exceeds Llama 3 70B in benchmarks
🔥 Discover the power now! 01-ai/yi-15-2024-05-663f3ecab5f815a3eaca7ca8
·
Reacted to davanstrien's post with 🔥 7 months ago
view post
Post
2580
Introducing CosmoChat, a multiturn chat dataset based on Cosmopedia that I'm working on in the open on the Hub.

🎯 Goals:
💬 Create multi-turn chats seeded from Cosmopedia
🎓 Customize questions for different audience levels
🔍 Evaluate the model's ability to elaborate and clarify
🤓 (I want to learn more about creating valuable synthetic datasets, and I learn best by doing stuff rather than reading stuff).

Cosmochat is created using the excellent distilabel library.

🔗 Explore the current version of the dataset: davanstrien/cosmochat
📝 Read more: https://huggingface.co/blog/davanstrien/cosmochat
  • 2 replies
·
posted an update 7 months ago
view post
Post
4103
Introducing llama-3-neural-chat-v2.2-8b! This powerful conversational AI model builds on Meta's Llama 3, fine-tuned by Locutusque for enhanced performance in coding, math & writing.

Locutusque/llama-3-neural-chat-v2.2-8B
  • 4 replies
·
posted an update 7 months ago
view post
Post
4397
I created a Twitter account a while back. I finally decided to make it public SebastianG74019. For those of you following @Locutusque on Twitter, that is not me! 😂
  • 2 replies
·
Reacted to m-ric's post with 🔥 8 months ago
view post
Post
2190
𝗡𝗲𝘄 𝗦𝗽𝗮𝗰𝗲: 𝘼𝙄 𝙏𝙧𝙖𝙫𝙚𝙡 𝙥𝙡𝙖𝙣𝙣𝙚𝙧 🗺️🏕️ Plan your next vacation in a few minutes!

I wanted to try out if a powerful LLM like Mixtral-8x7b had geographical reasoning capabilities.
So I built a small space that prompts the LLM to provide a JSON list of places based on a user input.

And the result was impressive! 🤯

⇒ 𝗜𝘁 𝘀𝗲𝗲𝗺𝘀 𝗹𝗶𝗸𝗲 𝗠𝗶𝘅𝘁𝗿𝗮𝗹 𝗵𝗮𝘀 𝗮 𝗴𝗿𝗮𝘀𝗽 𝗼𝗳 𝗴𝗲𝗼𝗴𝗿𝗮𝗽𝗵𝗶𝗰𝗮𝗹 𝗰𝗼𝗻𝗰𝗲𝗽𝘁𝘀 𝗹𝗶𝗸𝗲 𝗡𝗼𝗿𝘁𝗵 - 𝗦𝗼𝘂𝘁𝗵, 𝗼𝗿 𝘀𝗽𝗮𝘁𝗶𝗮𝗹 𝗮𝗹𝗶𝗴𝗻𝗺𝗲𝗻𝘁.🧭 Not just describing these concepts, but really applying them in practice, for instance to successfully answer "give me 4 European cities that are aligned on the map". This is a 𝗻𝗶𝗰𝗲 𝗲𝘅𝗮𝗺𝗽𝗹𝗲 𝗼𝗳 𝗮𝗻 𝗲𝗺𝗲𝗿𝗴𝗲𝗻𝘁 𝗰𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝘆, since nothing in the LLM's training data should prepare it for this specific task.

Anyway, I added API calls and a nice visualization on top of the LLM, streaming output, caching for the answers and locations... and ta-da! ✨ I got the 𝗔𝗜 𝗧𝗿𝗮𝘃𝗲𝗹 𝗣𝗹𝗮𝗻𝗻𝗲𝗿.

𝙔𝙤𝙪 𝙘𝙖𝙣 𝙙𝙚𝙨𝙘𝙧𝙞𝙗𝙚 𝙞𝙩 𝙮𝙤𝙪𝙧 𝙩𝙧𝙞𝙥, 𝙖𝙣𝙙 𝙞𝙩 𝙬𝙞𝙡𝙡 𝙘𝙤𝙢𝙚 𝙪𝙥 𝙬𝙞𝙩𝙝 𝙣𝙞𝙘𝙚 𝙖𝙣𝙙 𝙘𝙤𝙣𝙫𝙚𝙣𝙞𝙚𝙣𝙩 𝙡𝙤𝙘𝙖𝙩𝙞𝙤𝙣𝙨!

𝙏𝙧𝙮 𝙞𝙩 𝙝𝙚𝙧𝙚 👉 m-ric/ai-travel-planner

Thank you @freddyaboulton for the 𝚐𝚛𝚊𝚍𝚒𝚘_𝚏𝚘𝚕𝚒𝚞𝚖 component, and @clem , @pngwn , @abidlabs for your ideas and support!
  • 1 reply
·
replied to their post 8 months ago
replied to their post 8 months ago
view reply

Your right. I did mention this in the dataset card that it does not match the size of the Cerebrum dataset, and is something I'm going to try to achieve in the future, and this is used as a way to sort of test how I would go about structuring such a dataset. For now I'm trying to achieve the same performance, then I'll work towards structuring it similarly to the Cerebrum dataset. Thank you for holding me accountable about this.

posted an update 8 months ago
view post
Post
2639
Exciting news! 🎉 I've created the OpenCerebrum datasets, open-source alternatives to Aether Research's proprietary Cerebrum dataset.

The first, OpenCerebrum SFT, is a text-generation and question-answering dataset with ~1.2M examples, curated from sources like Open-Orca, glaiveai, camel-ai, and more! 📚

The second, OpenCerebrum DPO, is a smaller dataset with ~21k examples, focusing on data point optimization. It's curated from sources like jondurbin, argilla, grimulkan, and others. 📊

Both datasets are licensed under Apache-2.0 and are available in English. They're ready for use in your projects, and I welcome any feedback for future improvements! 🚀

Locutusque/OpenCerebrum-dpo
Locutusque/OpenCerebrum-SFT
Locutusque/OpenCerebrum-1.0-7b-SFT
Locutusque/OpenCerebrum-1.0-7b-DPO
·
posted an update 9 months ago
view post
Post
🚀 Excited to unveil the Augmented ARC-Challenge Dataset with Chain-of-Thought Reasoning! 🧠✨

📚 Created by enhancing the ARC dataset with AI-generated reasoning from Google's Gemini Pro, this resource aims to improve question answering models' ability to tackle complex science queries.

🔍 Features:
- 1068 training examples
- Detailed reasoning steps for nuanced understanding
- Questions spanning physics, chemistry, biology, & more!

🌟 Ideal for benchmarking QA models, enhancing model interpretability, and studying in-context examples.

🔗 Dive in and help your models learn the art of reasoning!

🔎 Explore more: Locutusque/arc-cot