NePe (Peter Kis)

liked a model 5 months ago

mistralai/Mistral-Small-Instruct-2409

Updated Oct 16, 2024 • 123k • 379

upvoted 2 papers 7 months ago

Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence

Paper • 2404.05892 • Published Apr 8, 2024 • 33

GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression

Paper • 2407.12077 • Published Jul 16, 2024 • 55

New activity in google/gemma-2-27b-it 8 months ago

with load_in_4bit it just generates <pad> tokens

2

#16 opened 8 months ago by

NePe

New activity in rainjay/gemma-2-27b-it-4bit 8 months ago

it just keeps generating <pad> tokens

3

#1 opened 8 months ago by

NePe

liked a model 8 months ago

rainjay/gemma-2-27b-it-4bit

Text Generation • Updated Jun 28, 2024 • 11 • 3

reacted to santiviquez's post with 🔥 9 months ago

Post

1567

I ran 580 experiments (yes, 580 🤯) to check if we can quantify data drift's impact on model performance using only drift metrics.

For these experiments, I built a technique that relies on drift signals to estimate model performance. I compared its results against the current SoTA performance estimation methods and checked which technique performs best.

The plot below summarizes the general results. It measures the quality of performance estimation versus the absolute performance change. (The lower, the better).

Full experiment: https://www.nannyml.com/blog/data-drift-estimate-model-performance

In it, I describe the setup, datasets, models, benchmarking methods, and the code used in the project.

liked a model 9 months ago

CohereForAI/aya-23-35B

Text Generation • Updated Oct 30, 2024 • 5.64k • • 271

reacted to andrewrreed's post with ❤️ 10 months ago

Post

2590

🔬 Open LLM Progress Tracker 🔬

Inspired by the awesome work from @mlabonne , I created a Space to monitor the narrowing gap between open and proprietary LLMs as scored by the LMSYS Chatbot Arena ELO ratings 🤗

The goal is to have a continuously updated place to easily visualize these rapidly evolving industry trends 🚀

🔗 Open LLM Progress Tracker: andrewrreed/closed-vs-open-arena-elo
🔗 Source of Inspiration: https://www.linkedin.com/posts/maxime-labonne_arena-elo-graph-updated-with-new-models-activity-7187062633735368705-u2jB/

2 replies

·

reacted to davanstrien's post with 🤗🔥 10 months ago

Post

1674

As part of the Data is Better Together MPEP project, we are now at the point where some translation efforts have successfully translated 500 highly ranked prompts into a new target language (amazing work from @Rijgersberg et al!)

Our next step is to use these translated prompts to evaluate the performance of LLMs for non English languages.

Does LLM, as a judge, work outside of English?

Ideally, it would be compelling to leverage LLMs to judge models for non-English since this significantly lowers the barrier to evaluating models (although it doesn't remove this barrier altogether).

What we want to know is:
- does auto/LLM eval work in general for a particular language
- which model(s) works best as a judge
- do LLMs' judgments of non-English models match human preferences?

We're starting to think about how to approach this. If you have any ideas of possible approaches feel free to comment or join the discussion here: https://github.com/huggingface/data-is-better-together/issues/61

Other ideas...

Could an approach like Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models (2404.18796) with the SOA models for a particular language work? i.e., choose 4 of the best open LLMs for Arabic and use those at the pool of raters rather than relying on one powerful judge LLM?

reacted to HugoLaurencon's post with 🚀 10 months ago

Post

3092

We release Idefics2-8B, a foundation vision language model with SOTA results for its size on many benchmarks.

For Idefics2, we adopted a simple architecture:
-Images are fed to a vision encoder, then to a modality projection to match the input dimension of the LLM, and finally to a perceiver resampler for efficient pooling.
-Interleaved image-text data are then passed to the LLM.

During the pre-training:
-The modality projection and perceiver resampler weights are newly initialized.
-We start with pre-trained models for the vision encoder and the LLM, and continue the training with LoRA.
-In total, we see 1.5T images!

We pre-train on 3 types of data, all publicly available:
-Interleaved image-text documents: our dataset OBELICS HuggingFaceM4/OBELICS
-Image caption pairs: only synthetic captions!
-PDF documents: IDL and PDFA

We kept the aspect ratio of the images with the Patch n' Pack strategy, with a resolution of up to 980x980.
At inference, it's also more efficient for lower-resolution images.

For the SFT, we build The Cauldron, a collection of 50 high-quality datasets in the user/assistant format.
It is a ready-to-use dataset for the fine-tuning of any VLM.
HuggingFaceM4/the_cauldron

Most current models, like LLaVA-NeXT, encode images with an excessive number of tokens, like 2880.
Instead, we put a focus on being efficient at inference by training on a mix of images encoded with 64 tokens, and 320 tokens.
The result is that we perform favorably compared to the best models in our size class, while being efficient at inference.

liked a model 11 months ago

1bitLLM/bitnet_b1_58-3B

Text Generation • Updated Mar 29, 2024 • 1.24k • 243

reacted to shumingma's post with ❤️ 11 months ago

Post

2698

The Era of 1-bit LLMs: Training Tips, Code and FAQ

https://github.com/microsoft/unilm/blob/master/bitnet/The-Era-of-1-bit-LLMs__Training_Tips_Code_FAQ.pdf

We present details and tips for training 1-bit LLMs. We also provide additional experiments and results that were not reported and responses to questions regarding the "The-Era-of-1-bit-LLM" paper. Finally, we include the official PyTorch implementation of BitNet (b1.58 and b1) for future research and development of 1-bit LLMs.

2 replies

·

replied to macadeliccc's post 12 months ago

The colab link seems to be for hqq quantization.

reacted to urchade's post with ❤️ 12 months ago

Post

Hi everyone,

I'd like to share our project on open-type Named Entity Recognition (NER). Our model uses a transformer encoder (BERT-like), making the computation overhead very minimal compared to use of LLMs. I've developed a demo that runs on CPU on Google Colab.

Colab Demo: https://colab.research.google.com/drive/1mhalKWzmfSTqMnR0wQBZvt9-ktTsATHB?usp=sharing

Code: https://github.com/urchade/GLiNER

Paper: https://arxiv.org/abs/2311.08526

8 replies

·

reacted to victor's post with 🤗 about 1 year ago

Post

🌠 Let's try to figure out this one as a community.
What reactions should we add to Posts and discussions?

If the reaction you want is already in the replies give it a thumbs up (👍) if it's not just add it as a reply.