Social Post Explorers

community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

social-post-explorers's activity

as-cle-bertย 
posted an update about 18 hours ago
view post
Post
756
๐ŸŽ‰๐„๐š๐ซ๐ฅ๐ฒ ๐๐ž๐ฐ ๐˜๐ž๐š๐ซ ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ๐ŸŽ‰

Hi HuggingFacers๐Ÿค—, I decided to ship early this year, and here's what I came up with:

๐๐๐Ÿ๐ˆ๐ญ๐ƒ๐จ๐ฐ๐ง (https://github.com/AstraBert/PdfItDown) - If you're like me, and you have all your RAG pipeline optimized for PDFs, but not for other data formats, here is your solution! With PdfItDown, you can convert Word documents, presentations, HTML pages, markdown sheets and (why not?) CSVs and XMLs in PDF format, for seamless integration with your RAG pipelines. Built upon MarkItDown by Microsoft
GitHub Repo ๐Ÿ‘‰ https://github.com/AstraBert/PdfItDown
PyPi Package ๐Ÿ‘‰ https://pypi.org/project/pdfitdown/

๐’๐ž๐ง๐“๐ซ๐„๐ฏ ๐ฏ๐Ÿ.๐ŸŽ.๐ŸŽ (https://github.com/AstraBert/SenTrEv/tree/v1.0.0) - If you need to evaluate the ๐—ฟ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฒ๐˜ƒ๐—ฎ๐—น performance of your ๐˜๐—ฒ๐˜…๐˜ ๐—ฒ๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด models, I have good news for you๐Ÿฅณ๐Ÿฅณ
The new release for ๐’๐ž๐ง๐“๐ซ๐„๐ฏ now supports ๐—ฑ๐—ฒ๐—ป๐˜€๐—ฒ and ๐˜€๐—ฝ๐—ฎ๐—ฟ๐˜€๐—ฒ retrieval (thanks to FastEmbed by Qdrant) with ๐˜๐—ฒ๐˜…๐˜-๐—ฏ๐—ฎ๐˜€๐—ฒ๐—ฑ ๐—ณ๐—ถ๐—น๐—ฒ ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐˜๐˜€ (.docx, .pptx, .csv, .html, .xml, .md, .pdf) and new ๐—ฟ๐—ฒ๐—น๐—ฒ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ ๐—บ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฐ๐˜€!
GitHub repo ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv
Release Notes ๐Ÿ‘‰ https://github.com/AstraBert/SenTrEv/releases/tag/v1.0.0
PyPi Package ๐Ÿ‘‰ https://pypi.org/project/sentrev/

Happy New Year and have fun!๐Ÿฅ‚
  • 2 replies
ยท
as-cle-bertย 
posted an update 4 days ago
view post
Post
470
Hi HF Community!๐Ÿค—

As my last 2024 contribution, I decided to write an article about a Competitive Debate Championship simulation I ran with 5 LLMs as competitors and 2 as judges:

https://huggingface.co/blog/as-cle-bert/debate-championship-for-llms

The article covers code, analyses and results, and you can find everything to reproduce this tournament in the GitHub repo ๐Ÿ‘‰ https://github.com/AstraBert/DebateLLM-Championship

I also released a dataset related to the data (motions, arguments, topics, winners...) collected during the tournament ๐Ÿ‘‰ as-cle-bert/DebateLLMs

Happy reading and happy new yeAIr!๐ŸŽ‰
  • 3 replies
ยท
as-cle-bertย 
posted an update 7 days ago
as-cle-bertย 
posted an update 9 days ago
view post
Post
1692
Hi HuggingFacers!๐Ÿคถ๐Ÿผ

As my last 2024 project, I've dropped a Discord Bot that knows a lot about Pokemons๐Ÿฆ‹

GitHub ๐Ÿ‘‰ https://github.com/AstraBert/Pokemon-Bot
Demo Space ๐Ÿ‘‰ as-cle-bert/pokemon-bot

The bot integrates:
- Chat features (Cohere's Command-R) with RAG functionalities (hybrid search and reranking with Qdrant) and chat memory (managed through PostgreSQL) to produce information about Pokemons
- Image-based search to identify Pokemons from their images (via Qdrant)
- Card package random extraction and description

HuggingFace๐Ÿค—, as usual, plays the most important role in the application stack, with the following models:

- sentence-transformers/LaBSE
- prithivida/Splade_PP_en_v1
- facebook/dinov2-large

And datasets:

- Karbo31881/Pokemon_images
- wanghaofan/pokemon-wiki-captions
- TheFusion21/PokemonCards

Have fun!๐Ÿ•
akhaliqย 
posted an update 14 days ago
view post
Post
3344
Google drops Gemini 2.0 Flash Thinking

a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more

now available in anychat, try it out: akhaliq/anychat
m-ricย 
posted an update 14 days ago
view post
Post
1870
After 6 years, BERT, the workhorse of encoder models, finally gets a replacement: ๐—ช๐—ฒ๐—น๐—ฐ๐—ผ๐—บ๐—ฒ ๐— ๐—ผ๐—ฑ๐—ฒ๐—ฟ๐—ป๐—•๐—˜๐—ฅ๐—ง! ๐Ÿค—

We talk a lot about โœจGenerative AIโœจ, meaning "Decoder version of the Transformers architecture", but this is only one of the ways to build LLMs: encoder models, that turn a sentence in a vector, are maybe even more widely used in industry than generative models.

The workhorse for this category has been BERT since its release in 2018 (that's prehistory for LLMs).

It's not a fancy 100B parameters supermodel (just a few hundred millions), but it's an excellent workhorse, kind of a Honda Civic for LLMs.

Many applications use BERT-family models - the top models in this category cumulate millions of downloads on the Hub.

โžก๏ธ Now a collaboration between Answer.AI and LightOn just introduced BERT's replacement: ModernBERT.

๐—ง๐—Ÿ;๐——๐—ฅ:
๐Ÿ›๏ธ Architecture changes:
โ‡’ First, standard modernizations:
- Rotary positional embeddings (RoPE)
- Replace GeLU with GeGLU,
- Use Flash Attention 2
โœจ The team also introduced innovative techniques like alternating attention instead of full attention, and sequence packing to get rid of padding overhead.

๐Ÿฅ‡ As a result, the model tops the game of encoder models:
It beats previous standard DeBERTaV3 for 1/5th the memory footprint, and runs 4x faster!

Read the blog post ๐Ÿ‘‰ https://huggingface.co/blog/modernbert
  • 1 reply
ยท
m-ricย 
posted an update 15 days ago
view post
Post
2253
๐‡๐ฎ๐ ๐ ๐ข๐ง๐  ๐…๐š๐œ๐ž ๐ซ๐ž๐ฅ๐ž๐š๐ฌ๐ž๐ฌ ๐๐ข๐œ๐จ๐ญ๐ซ๐จ๐ง, ๐š ๐ฆ๐ข๐œ๐ซ๐จ๐ฌ๐œ๐จ๐ฉ๐ข๐œ ๐ฅ๐ข๐› ๐ญ๐ก๐š๐ญ ๐ฌ๐จ๐ฅ๐ฏ๐ž๐ฌ ๐‹๐‹๐Œ ๐ญ๐ซ๐š๐ข๐ง๐ข๐ง๐  ๐Ÿ’๐ƒ ๐ฉ๐š๐ซ๐š๐ฅ๐ฅ๐ž๐ฅ๐ข๐ณ๐š๐ญ๐ข๐จ๐ง ๐Ÿฅณ

๐Ÿ•ฐ๏ธ Llama-3.1-405B took 39 million GPU-hours to train, i.e. about 4.5 thousand years.

๐Ÿ‘ด๐Ÿป If they had needed all this time, we would have GPU stories from the time of Pharaoh ๐“‚€: "Alas, Lord of Two Lands, the shipment of counting-stones arriving from Cathay was lost to pirates, this shall delay the building of your computing temple by many moons "

๐Ÿ› ๏ธ But instead, they just parallelized the training on 24k H100s, which made it take just a few months.
This required parallelizing across 4 dimensions: data, tensor, context, pipeline.
And it is infamously hard to do, making for bloated code repos that hold together only by magic.

๐Ÿค ๐—•๐˜‚๐˜ ๐—ป๐—ผ๐˜„ ๐˜„๐—ฒ ๐—ฑ๐—ผ๐—ป'๐˜ ๐—ป๐—ฒ๐—ฒ๐—ฑ ๐—ต๐˜‚๐—ด๐—ฒ ๐—ฟ๐—ฒ๐—ฝ๐—ผ๐˜€ ๐—ฎ๐—ป๐˜†๐—บ๐—ผ๐—ฟ๐—ฒ! Instead of building mega-training codes, Hugging Face colleagues cooked in the other direction, towards tiny 4D parallelism libs. A team has built Nanotron, already widely used in industry.
And now a team releases Picotron, a radical approach to code 4D Parallelism in just a few hundred lines of code, a real engineering prowess, making it much easier to understand what's actually happening!

โšก ๐—œ๐˜'๐˜€ ๐˜๐—ถ๐—ป๐˜†, ๐˜†๐—ฒ๐˜ ๐—ฝ๐—ผ๐˜„๐—ฒ๐—ฟ๐—ณ๐˜‚๐—น:
Counting in MFU (Model FLOPs Utilization, how much the model actually uses all the compute potential), this lib reaches ~50% on SmolLM-1.7B model with 8 H100 GPUs, which is really close to what huge libs would reach. (Caution: the team is leading further benchmarks to verify this)

Go take a look ๐Ÿ‘‰ https://github.com/huggingface/picotron/tree/main/picotron
  • 1 reply
ยท
m-ricย 
posted an update 20 days ago
view post
Post
2173
๐—ฃ๐—ผ๐˜๐—ฒ๐—ป๐˜๐—ถ๐—ฎ๐—น ๐—ฝ๐—ฎ๐—ฟ๐—ฎ๐—ฑ๐—ถ๐—ด๐—บ ๐˜€๐—ต๐—ถ๐—ณ๐˜ ๐—ถ๐—ป ๐—Ÿ๐—Ÿ๐— ๐˜€: ๐—ป๐—ฒ๐˜„ ๐—ฝ๐—ฎ๐—ฝ๐—ฒ๐—ฟ ๐—ฏ๐˜† ๐— ๐—ฒ๐˜๐—ฎ ๐—ฐ๐—น๐—ฎ๐—ถ๐—บ๐˜€ ๐˜๐—ต๐—ฎ๐˜ ๐˜„๐—ฒ ๐—ฐ๐—ฎ๐—ป ๐—ด๐—ฒ๐˜ ๐—ฟ๐—ถ๐—ฑ ๐—ผ๐—ณ ๐˜๐—ผ๐—ธ๐—ฒ๐—ป๐—ถ๐˜‡๐—ฒ๐—ฟ๐˜€! ๐Ÿฅณ

Current LLMs process text by first splitting it into tokens. They use a module named "tokenizer", that -spl-it-s- th-e- te-xt- in-to- arbitrary tokens depending on a fixed dictionnary.
On the Hub you can find this dictionary in a model's files under tokenizer.json.

โžก๏ธ This process is called BPE tokenization. It is suboptimal, everyone says it. It breaks text into predefined chunks that often fail to capture the nuance of language. But it has been a necessary evil in language models since their inception.

๐Ÿ’ฅ In Byte Latent Transformer (BLT), Meta researchers propose an elegant solution by eliminating tokenization entirely, working directly with raw bytes while maintaining efficiency through dynamic "patches."

This had been tried before with different byte-level tokenizations, but it's the first time that an architecture of this type scales as well as BPE tokenization. And it could mean a real paradigm shift! ๐Ÿ‘๐Ÿ‘

๐Ÿ—๏ธ ๐—”๐—ฟ๐—ฐ๐—ต๐—ถ๐˜๐—ฒ๐—ฐ๐˜๐˜‚๐—ฟ๐—ฒ:
Instead of a lightweight tokenizer, BLT has a lightweight encoder that process raw bytes into patches. Then the patches are processed by the main heavy-duty transformers as we do normally (but for patches of bytes instead of tokens), before converting back to bytes.

๐Ÿงฉ ๐——๐˜†๐—ป๐—ฎ๐—บ๐—ถ๐—ฐ ๐—ฃ๐—ฎ๐˜๐—ฐ๐—ต๐—ถ๐—ป๐—ด:
Instead of fixed tokens, BLT groups bytes based on their predictability (measured by entropy) - using more compute for complex sequences and efficiently handling simple ones. This allows efficient processing while maintaining byte-level understanding.

I hope this breakthrough is confirmed and we can get rid of all the tokenizer stuff, it will make model handling easier!

Read their paper here ๐Ÿ‘‰ https://dl.fbaipublicfiles.com/blt/BLT__Patches_Scale_Better_Than_Tokens.pdf
  • 2 replies
ยท
as-cle-bertย 
posted an update 22 days ago
m-ricย 
posted an update 22 days ago
view post
Post
2482
๐Ÿ’ฅ ๐—š๐—ผ๐—ผ๐—ด๐—น๐—ฒ ๐—ฟ๐—ฒ๐—น๐—ฒ๐—ฎ๐˜€๐—ฒ๐˜€ ๐—š๐—ฒ๐—บ๐—ถ๐—ป๐—ถ ๐Ÿฎ.๐Ÿฌ, ๐˜€๐˜๐—ฎ๐—ฟ๐˜๐—ถ๐—ป๐—ด ๐˜„๐—ถ๐˜๐—ต ๐—ฎ ๐—™๐—น๐—ฎ๐˜€๐—ต ๐—บ๐—ผ๐—ฑ๐—ฒ๐—น ๐˜๐—ต๐—ฎ๐˜ ๐˜€๐˜๐—ฒ๐—ฎ๐—บ๐—ฟ๐—ผ๐—น๐—น๐˜€ ๐—š๐—ฃ๐—ง-๐Ÿฐ๐—ผ ๐—ฎ๐—ป๐—ฑ ๐—–๐—น๐—ฎ๐˜‚๐—ฑ๐—ฒ-๐Ÿฏ.๐Ÿฒ ๐—ฆ๐—ผ๐—ป๐—ป๐—ฒ๐˜! And they start a huge effort on agentic capabilities.

๐Ÿš€ The performance improvements are crazy for such a fast model:
โ€ฃ Gemini 2.0 Flash outperforms the previous 1.5 Pro model at twice the speed
โ€ฃ Now supports both input AND output of images, video, audio and text
โ€ฃ Can natively use tools like Google Search and execute code

โžก๏ธ If the price is on par with previous Flash iteration ($0.30 / M tokens, to compare with GPT-4o's $1.25) the competition will have a big problem with this 4x cheaper model that gets better benchmarks ๐Ÿคฏ

๐Ÿค– What about the agentic capabilities?

โ€ฃ Project Astra: A universal AI assistant that can use Google Search, Lens and Maps
โ€ฃ Project Mariner: A Chrome extension that can complete complex web tasks (83.5% success rate on WebVoyager benchmark, this is really impressive!)
โ€ฃ Jules: An AI coding agent that integrates with GitHub workflows

I'll be eagerly awaiting further news from Google!

Read their blogpost here ๐Ÿ‘‰ https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/
m-ricย 
posted an update 23 days ago
view post
Post
1798
๐’๐œ๐š๐ฅ๐ข๐ง๐  ๐ฅ๐š๐ฐ๐ฌ ๐š๐ซ๐ž ๐ง๐จ๐ญ ๐๐ž๐š๐ ๐ฒ๐ž๐ญ! New blog post suggests Anthropic might have an extremely strong Opus-3.5 already available, but is not releasing it to keep their edge over the competition. ๐Ÿง

โ“Since the release of Opus-3.5 has been delayed indefinitely, there have been lots of rumors and articles about LLMs plateauing. Scaling laws, the main powering factor of the LLM competence increase, could have stopped, according to these rumors, being the cause of this stalling of progress.

These rumors were quickly denied by many people at the leading LLM labs, including OpenAI and Anthropic. But these people would be expected to hype the future of LLMs even if scaling laws really plateaued, so the jury is still out.

๐Ÿ—ž๏ธ This new article by Semianalysis (generally a good source, specifically on hardware) provides a counter-rumor that I find more convincing:

โžก๏ธ Maybe scaling laws still work, Opus-3.5 is ready and as good as planned, but they just don't release it because the synthetic data it helps provide can bring cheaper/smaller models Claude and Haiku up in performance, without risking to leak this precious high-quality synthetic data to competitors.

Time will tell! I feel like we'll know more soon.

Read the article: https://semianalysis.com/2024/12/11/scaling-laws-o1-pro-architecture-reasoning-infrastructure-orion-and-claude-3-5-opus-failures/
  • 1 reply
ยท
m-ricย 
posted an update 24 days ago
view post
Post
2225
Last week was crazy in OS AI, with important models and datasets releases every day.

Here are the most important ones I've pinned:

๐ŸŒŽ Cohere relased GLobal-MMLU, a multilingual version of MMLU, to evaluate AI models' world knowledge in many languages!

๐Ÿฆ™ Meta released Llama-3.3-70B-Instruct, a 70B model that's on par with Llama-3.1-405B-Instruct, GPT-4o and Claude. Probably my new go-to for agentic workflows.

๐Ÿ”‰ FishAudio released fish-speech-1.5, multilingual text to speech model

๐ŸŽจ Microsoft Research released TRELLIS, an extremely impressive image-to-3D model, which you can try here: JeffreyXiang/TRELLIS

๐Ÿ“š Yesterday, Hugging Face release FineWeb 2, a new version that extends the previous FineWeb to over 1000 languages, including extended coverage in Russina, Mandarin, German, Japanese, Spanish, French, so a huge, high-quality dataset of > 3 trillion words! HuggingFaceFW/fineweb-2

Now let's go build to make this week as productive as last one!
reach-vbย 
posted an update 27 days ago
view post
Post
3451
VLMs are going through quite an open revolution AND on-device friendly sizes:

1. Google DeepMind w/ PaliGemma2 - 3B, 10B & 28B: google/paligemma-2-release-67500e1e1dbfdd4dee27ba48

2. OpenGVLabs w/ InternVL 2.5 - 1B, 2B, 4B, 8B, 26B, 38B & 78B: https://huggingface.co/collections/OpenGVLab/internvl-25-673e1019b66e2218f68d7c1c

3. Qwen w/ Qwen 2 VL - 2B, 7B & 72B: Qwen/qwen2-vl-66cee7455501d7126940800d

4. Microsoft w/ FlorenceVL - 3B & 8B: https://huggingface.co/jiuhai

5. Moondream2 w/ 0.5B: https://huggingface.co/vikhyatk/

What a time to be alive! ๐Ÿ”ฅ
BramVanroyย 
posted an update 28 days ago
view post
Post
458
In the spirit of "Better late than never", I've finally written a brief overview paper for GEITje 7B Ultra. Initially released 10 months ago (oops), but still reaching around 1300 monthly downloads across the HF ecosystem (not including ollama).

GEITje 7B Ultra: A Conversational Model for Dutch (2412.04092)

While the paper discusses the model a little bit, I especially wanted to write about the datasets, which to this day seem an important asset for Dutch LLM training (SFT and preference tuning). We have a long way to go for Dutch, but publishing transparent and reproducible artefacts seems an important step to me, alongside having open discussions about data, bias, architectures.

In that spirit, thanks are in order for the creation of GEITje 7B Ultra and all related datasets:

- Michiel Buisman and UWV for providing the means to create the datasets
- Flemish Supercomputer Center (VSC) for the compute
- The Hugging Face Fellows and rest of the team for their discussions and insights
- The Dutch NLP community, notably @Rijgersberg for building the base GEITje model and the fruitful discussions we've had

More to come, step by step!

BramVanroy/geitje-7b-ultra-65c1ee010ad80fd1f6a8f208
m-ricย 
posted an update 30 days ago
view post
Post
1474
๐—ฆ๐—ต๐—ผ๐˜„๐—จ๐—œ: ๐—ฎ ๐˜€๐—บ๐—ฎ๐—น๐—น ๐—ฒ๐—ป๐—ฑ-๐˜๐—ผ-๐—ฒ๐—ป๐—ฑ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜ ๐˜๐—ต๐—ฎ๐˜ ๐—ฐ๐—ฎ๐—ป ๐—ป๐—ฎ๐˜ƒ๐—ถ๐—ด๐—ฎ๐˜๐—ฒ ๐—ฎ๐—ป๐˜† ๐—จ๐—œ ๐—ฎ๐—ป๐—ฑ ๐—ผ๐˜‚๐˜๐—ฝ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐˜€ ๐—บ๐˜‚๐—ฐ๐—ต ๐—ฏ๐—ถ๐—ด๐—ด๐—ฒ๐—ฟ ๐˜€๐˜†๐˜€๐˜๐—ฒ๐—บ๐˜€! ๐Ÿ“ฒ

A team from NUS and Microsoft just released an agent that can act on any UI (Desktop, Android, Web) without needing additional text information. It works extremely well : they applied their method on a tiny Qwen2-VL-2B, and they managed to beat methods that use either much more powerful vision models (like GPT-4V) without using any additional info (e.g. leveraging the DOM of a webpage) like previous methods did ! ๐Ÿ‘๐Ÿ‘

They started from the idea that most existing methods rely heavily on text, which makes them less generalizable, while letting aside rich UI structure that user actually rely on when navigating this interfaces.

โš™๏ธ They put several good ideas to work:

๐Ÿ’ก Simplify screenshots to the max:
They prune a lot the heavy visual content of UI screenshots, by removing cloned image patches (like any vast patch of the same color will be reduced to a small patch, while maintaining positional embeddings), then group patches from the same GUI elements together to simplify even further

๐Ÿ’ก Build a truly generalist dataset:
To train a general UI agent, you need trajectories from each possible UI, and express them in a common language. Authors merge datasets like OmniAct for Desktop, Mind2Web for websites, AMEX for Android trajectories to create a high-quality and diverse dataset.

โžก๏ธ Nice results ensued:
They fine-tune a tiny Qwen-2-VL-2B on their method, and it reaches SOTA on several task (element identification, web navigation), even beating methods that either use additional info from the DOM or use much bigger VLMS like GPT-4v! ๐Ÿ†

And performance could certainly jump with a slightly bigger vision model. Let's hope the community builds this soon! ๐Ÿš€

Paper added to my "Agents" collection ๐Ÿ‘‰ m-ric/agents-65ba776fbd9e29f771c07d4e
m-ricย 
posted an update about 1 month ago
view post
Post
1211
Need a measurement for traction of a GitHub repo, a more reliable one than Github star history? (which is a bit too hype-driven) ๐Ÿ“ˆ

โžก๏ธ I've made a Space to visualize PyPI downloads.

Try it here ๐Ÿ‘‰ m-ric/package-download-history
  • 1 reply
ยท
m-ricย 
posted an update about 1 month ago
view post
Post
1275
๐Ÿค– ๐—”๐—ฑ๐—ผ๐—ฏ๐—ฒ'๐˜€ ๐—ฐ๐—ผ๐—ฑ๐—ฒ-๐—ด๐—ฒ๐—ป๐—ฒ๐—ฟ๐—ฎ๐˜๐—ถ๐—ป๐—ด ๐—ฎ๐—ด๐—ฒ๐—ป๐˜ ๐—ฟ๐—ฒ๐—ฎ๐—ฐ๐—ต๐—ฒ๐˜€ ๐˜๐—ต๐—ฒ ๐˜๐—ผ๐—ฝ ๐—ผ๐—ณ ๐—š๐—”๐—œ๐—” ๐—น๐—ฒ๐—ฎ๐—ฑ๐—ฒ๐—ฟ๐—ฏ๐—ผ๐—ฎ๐—ฟ๐—ฑ - and their paper cites my work!

๐Ÿ’ก Reminder:ย In short, Agentic systems are a vehicle in which you put your LLM to allow it access to the outside world.

โžก๏ธ The team of researchers at Adobe started from the idea that current agentic systems lack the ability to define their own tools. So they decided to make an agent that writes actions as code, thus allowing it to write python functions that can be re-used later as tools!

Here's what the LLM generations can look like with the proper prompt:

Thought: I need to access the excel file using a different method.
Action:
def access_excel_file(file_path)
	... # rest of the code (the agent does writes it, but I don't have room in this post)
	return rows


Then your system executes this and appends the observation to the agent's memory.

Why is this code formulation better than classical tool use formulation as JSON? The paper explains:

"Most existing work uses text or JSON as the representation of actions, which significantly lacks the two criteria mentioned earlier: generality and composability. In contrast, DynaSaur can utilize available actions or create new ones if necessary, using code as a unified representation. In principle, acting with code enables agents to solve any Turing-complete problem."

The idea of using code is not new: in fact, we do it in transformers.agents (thus the citation that I got). They implementation adds further refinements, like using RAG to retrieve relevant functions before generating an action, which increases performance further.

And they observe that code agents perform much better, reaching the top of GAIA leaderboard! ๐Ÿฅ‡

Go take a look, it's really clear and informative!

Paper added to my agents collection ๐Ÿ‘‰ m-ric/agents-65ba776fbd9e29f771c07d4e
as-cle-bertย 
posted an update about 1 month ago
view post
Post
1416
Hi HuggingFacers!๐Ÿค—
December is here and time has come, for most of us, to wrap up our code projects and take stock of our 2024 contributions๐Ÿ—“๏ธ
In order to do this, I made a small Gradio application, what-a-git-year:

as-cle-bert/what-a-git-year

that scrapes information from your GitHub profile and summarizes them, producing also nice plots๐Ÿ“Š
Find also the GitHub repo here: https://github.com/AstraBert/what-a-git-year โญ

Hope that everyone had a Git year!๐ŸŽ‰
m-ricย 
posted an update about 1 month ago
view post
Post
2378
Single most important thing to do today: ๐—ด๐—ผ ๐˜๐—ฟ๐˜† ๐—ค๐˜„๐—ค ๐—ผ๐—ป ๐—›๐˜‚๐—ด๐—ด๐—ถ๐—ป๐—ด ๐—–๐—ต๐—ฎ๐˜!

๐Ÿ‘‰ https://huggingface.co/chat/models/Qwen/QwQ-32B-Preview
  • 2 replies
ยท