Louis Brulé Naudet

louisbrulenaudet

AI & ML interests

Research in business taxation and development (NLP, LLM, Computer vision...), University Dauphine-PSL 📖 | Backed by the Microsoft for Startups Hub program and Google Cloud Platform for startups program.

Organizations

louisbrulenaudet's activity

replied to their post 9 days ago
view reply

Hi Julius,

The error message indicates that the token used is invalid. Perhaps the explanation lies in the model used, have you accepted the license for it, I'm thinking in particular of the Llama models, and if so, have you configured a read access key in your settings?

posted an update 10 days ago
view post
Post
3881
Mixtral or Llama 70B on Google Spreadsheet thanks to Hugging Face's Serverless Inference API 🤗

The Add-on is now available on the HF repo "Journalists on Hugging Face" and allows rapid generation of synthetic data, automatic translation, answering questions and more from simple spreadsheet cells 🖥️

Link to the 🤗 Space : JournalistsonHF/huggingface-on-sheets

Although this tool was initially developed for journalists, it actually finds a much wider inking among daily users of the Google suite and the remaining use cases to be explored are numerous.

Only a free Hugging Face API key is required to start using this no-code extension.

Do not hesitate to submit ideas for features that we could add!

Thanks to @fdaudens for initiating this development.
  • 4 replies
·
replied to fdaudens's post 15 days ago
view reply

Wonderful, I love the demo, is there already a GitHub repo for the project?

Thanks a lot and have a nice day.

posted an update about 1 month ago
view post
Post
961
I've just open sourced RAGoon, a small utility I use to integrate knowledge from the web into LLM inference based on Groq speed and pure Google search performance ⚡

RAGoon is a Python library available on PyPI that aims to improve the performance of language models by providing contextually relevant information through retrieval-based querying, parallel web scraping, and data augmentation techniques. It offers an integration of various APIs (OpenAI, Groq), enabling users to retrieve information from the web, enrich it with domain-specific knowledge, and feed it to language models for more informed responses.
from groq import Groq
# from openai import OpenAI
from ragoon import RAGoon

# Initialize RAGoon instance
ragoon = RAGoon(
    google_api_key="your_google_api_key",
    google_cx="your_google_cx",
    completion_client=Groq(api_key="your_groq_api_key")
)

# Search and get results
query = "I want to do a left join in python polars"
results = ragoon.search(
    query=query,
    completion_model="Llama3-70b-8192",
)

# Print list of results
print(results)

For the time being, this project remains simple, but can easily be integrated into a RAG pipeline.

Link to GitHub : https://github.com/louisbrulenaudet/ragoon
posted an update about 1 month ago
view post
Post
493
Integrating the French Taxation Embedding Benchmark Task (beta) into the MTEB 🤗

I'm excited to announce an integration of the French Taxation Embedding Benchmark task into the Massive Text Embedding Benchmark (MTEB).

This addition expands the diverse set of tasks available within MTEB, enabling researchers and practitioners to develop and evaluate retrieval models focused on retrieving relevant tax articles or content based on provided queries.

Link to the 🤗 Dataset : louisbrulenaudet/tax-retrieval-benchmark

Link to the GitHub repo : https://github.com/louisbrulenaudet/tax-retrieval-benchmark

Notes:
The Massive Text Embedding Benchmark for French Taxation and the Dataset are currently in beta and may not be suitable for direct use in production. The size of the Dataset may not be sufficient to handle a wide range of queries and scenarios encountered in real-world settings.

As the Dataset grows and matures, I will provide updates and guidance on its suitability for production use cases.
posted an update 3 months ago
view post
Post
2240
LegalKit Retrieval, a binary Search with Scalar (int8) Rescoring through French legal codes is now available as a 🤗 Space.

This process is designed to be memory efficient and fast, with the binary index being small enough to fit in memory and the int8 index being loaded as a view. Additionally, the binary index is much faster (up to 32x) to search than the float32 index, while the rescoring is also extremely efficient.

This space also showcases the tsdae-lemone-mbert-base, a sentence embedding model based on BERT fitted using Transformer-based Sequential Denoising Auto-Encoder for unsupervised sentence embedding learning with one objective : french legal domain adaptation.

Link to the 🤗 Space : louisbrulenaudet/legalkit-retrieval

Notes:
The SentenceTransformer model currently in use is in beta and may not be suitable for direct use in production.
  • 2 replies
·
posted an update 3 months ago
view post
Post
2307
To date, louisbrulenaudet/Maxine-34B-stock is the "Best 🤝 base merges and moerges model of around 30B" on the Open LLM Leaderboard ❤️‍🔥

It is a practical application of the stock method recently implemented by @arcee-ai in the MergeKit :
models:
    - model: ConvexAI/Luminex-34B-v0.2
    - model: fblgit/UNA-34BeagleSimpleMath-32K-v1
merge_method: model_stock
base_model: abacusai/Smaug-34B-v0.1
dtype: bfloat16

Model : louisbrulenaudet/Maxine-34B-stock
LLM Leaderboard best models ❤️‍🔥 Collection : open-llm-leaderboard/llm-leaderboard-best-models-652d6c7965a4619fb5c27a03
replied to victor's post 5 months ago
replied to victor's post 5 months ago