24 1 9

s

Tom-Neverwinter

Tom-Neverwinter

AI & ML interests

Making improvements to help the world.

Recent Activity

reacted to csabakecskemeti's post with 🔥 about 1 month ago

I've built a small utility to split safetensors file by file. The issue/need came up when I've tried to convert the new Deepseek V3 model from FP8 to BF16. The only Ada architecture GPU I have is an RTX 4080 and the 16GB vram was just wasn't enough for the conversion. BTW: I'll upload the bf16 version here: https://huggingface.co/DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16 (it will take a while - days with my upload speed) If anyone has access the resources to test it I'd appreciate a feedback if it's working or not. The tool, is available from here: https://github.com/csabakecskemeti/ai_utils/blob/main/safetensor_splitter.py It's splitting every file to n pieces by the layers if possible, and create a new "model.safetensors.index.json" file. I've tested it with Llama 3.1 8B and multiple split sizes, and validated by using inference pipeline. use `--help` for usage Please note current version expects the model is already multiple file and have a "model.safetensors.index.json" layer-safetensor mapping file.

new activity about 1 month ago

Apollo-LMMs/README:model pulled

reacted to tomaarsen's post with ❤️ 4 months ago

📣 Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost. 1️⃣ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference. 2️⃣ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU. Usage is as simple as `SentenceTransformer("all-MiniLM-L6-v2", backend="onnx")`. Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later 😉 🔒 Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways: 1️⃣ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with `from_model2vec` or with `from_distillation` where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed. 2️⃣ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU. Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0 Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

View all activity

Organizations

None yet

Tom-Neverwinter's activity

reacted to csabakecskemeti's post with 🔥 about 1 month ago

Post

1474

I've built a small utility to split safetensors file by file.
The issue/need came up when I've tried to convert the new Deepseek V3 model from FP8 to BF16.
The only Ada architecture GPU I have is an RTX 4080 and the 16GB vram was just wasn't enough for the conversion.

BTW: I'll upload the bf16 version here:
DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16
(it will take a while - days with my upload speed)
If anyone has access the resources to test it I'd appreciate a feedback if it's working or not.

The tool, is available from here:
https://github.com/csabakecskemeti/ai_utils/blob/main/safetensor_splitter.py
It's splitting every file to n pieces by the layers if possible, and create a new "model.safetensors.index.json" file.
I've tested it with Llama 3.1 8B and multiple split sizes, and validated by using inference pipeline.
use --help for usage
Please note current version expects the model is already multiple file and have a "model.safetensors.index.json" layer-safetensor mapping file.

New activity in Apollo-LMMs/README about 1 month ago

model pulled

#1 opened about 1 month ago by

Tom-Neverwinter

reacted to tomaarsen's post with ❤️🚀🔥 4 months ago

Post

7011

📣 Sentence Transformers v3.2.0 is out, marking the biggest release for inference in 2 years! 2 new backends for embedding models: ONNX (+ optimization & quantization) and OpenVINO, allowing for speedups up to 2x-3x AND Static Embeddings for 500x speedups at 10-20% accuracy cost.

1️⃣ ONNX Backend: This backend uses the ONNX Runtime to accelerate model inference on both CPU and GPU, reaching up to 1.4x-3x speedup depending on the precision. We also introduce 2 helper methods for optimizing and quantizing models for (much) faster inference.
2️⃣ OpenVINO Backend: This backend uses Intel their OpenVINO instead, outperforming ONNX in some situations on CPU.

Usage is as simple as SentenceTransformer("all-MiniLM-L6-v2", backend="onnx"). Does your model not have an ONNX or OpenVINO file yet? No worries - it'll be autoexported for you. Thank me later 😉

🔒 Another major new feature is Static Embeddings: think word embeddings like GLoVe and word2vec, but modernized. Static Embeddings are bags of token embeddings that are summed together to create text embeddings, allowing for lightning-fast embeddings that don't require any neural networks. They're initialized in one of 2 ways:

1️⃣ via Model2Vec, a new technique for distilling any Sentence Transformer models into static embeddings. Either via a pre-distilled model with from_model2vec or with from_distillation where you do the distillation yourself. It'll only take 5 seconds on GPU & 2 minutes on CPU, no dataset needed.
2️⃣ Random initialization. This requires finetuning, but finetuning is extremely quick (e.g. I trained with 3 million pairs in 7 minutes). My final model was 6.6% worse than bge-base-en-v1.5, but 500x faster on CPU.

Full release notes: https://github.com/UKPLab/sentence-transformers/releases/tag/v3.2.0
Documentation on Speeding up Inference: https://sbert.net/docs/sentence_transformer/usage/efficiency.html

1 reply

reacted to merve's post with 🔥 4 months ago

Post

3785

Meta AI vision has been cooking @facebook
They shipped multiple models and demos for their papers at @ECCV 🤗

Here's a compilation of my top picks:
- Sapiens is family of foundation models for human-centric depth estimation, segmentation and more, all models have open weights and demos 👏

All models have their demos and even torchscript checkpoints!
A collection of models and demos: facebook/sapiens-66d22047daa6402d565cb2fc
- VFusion3D is state-of-the-art consistent 3D generation model from images

Model: facebook/vfusion3d
Demo: facebook/VFusion3D

- CoTracker is the state-of-the-art point (pixel) tracking model

Demo: facebook/cotracker
Model: facebook/cotracker

reacted to louisbrulenaudet's post with 👍 5 months ago

Post

2608

The Romulus model series has been released on Hugging Face, continually pre-trained on 34,864,949 tokens of French laws and intended to serve as a foundation for fine-tuning on labeled data 🤗

The training code, dataset and model weights are open and available free on HF and the training was based on H100 provided by Microsoft for Startups using Unsloth AI by @danielhanchen and @shimmyshimmer 🦥

Link to the base model: louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1

Link to the instruct model: louisbrulenaudet/Romulus-cpt-Llama-3.1-8B-v0.1-Instruct

Link to the dataset: louisbrulenaudet/Romulus-cpt-fr

Please note that these models have not been aligned for the production of usable texts as they stand, and will certainly need to be refined for the desired tasks in order to produce satisfactory results.

1 reply

New activity in multimodalart/flux-lora-the-explorer 5 months ago

how to make a lora

#2 opened 6 months ago by

guardiancc

updated 4 models 6 months ago

New activity in meta-llama/Llama-3.1-8B-Instruct 6 months ago

Issues loading model with ooabooga textgenwebui

#20 opened 6 months ago by

Kenji776

reacted to vikhyatk's post with 🔥 6 months ago

Post

3288

🚀 Exciting news! We've just launched "Thundermoon" - the latest version of Moondream, our open-source vision language model! 🌙

Key improvements in this release:
1. Massive leap in OCR capabilities
2. Enhanced document understanding
3. Significant boosts across key metrics:
* DocVQA: 61.9 (↑103%)
* TextVQA: 60.2 (↑5.2%)
* GQA: 64.9 (↑2.9%)

What does this mean? Moondream can now tackle complex document analysis tasks with unprecedented accuracy for a model of its size. From deciphering handwritten notes to interpreting data tables, the applications are vast.

Check out the image for a glimpse of Moondream in action, effortlessly extracting insights from a 1944 sugar industry document!

Why it matters:
* Democratizing AI: As an open-source project, we're making advanced vision AI accessible to all developers.
* Efficiency: Proving that smaller models can deliver big results.
* Real-world impact: From historical document analysis to modern business intelligence, the potential use cases are exciting.

Curious to try it out? Try out the live demo here! https://moondream.ai/playground

4 replies

replied to Xenova's post 6 months ago

still have yet to get this to work locally even following the instructions stated on reddit.

New activity in Xenova/whisper-speaker-diarization 6 months ago

how do we run this?

#2 opened 6 months ago by

Tom-Neverwinter

liked a Space 7 months ago

Running on CPU Upgrade

12.4k

🏆

Open LLM Leaderboard

Track, rank and evaluate open LLMs and chatbots

reacted to lamhieu's post with 😔 7 months ago

Post

4283

🎉 The Ghost 8B Beta model outperforms prominent models such as Llama 3 8B Instruct, GPT 3.5 Turbo in the lc_winrate score. In addition, it also outperforms Claude 3 Opus, Claude 3 Sonnet, GPT-4, and Mistral Large when comparing the winrate score of AlpacaEval 2.0.

Ghost 8B Beta is a large language model developed with goals that include excellent multilingual support, superior knowledge capabilities, and cost-effectiveness. The model comes in two context length versions, 8k and 128k, along with multilingual function tools support by default.
The languages supported are 🇺🇸 English, 🇫🇷 French, 🇮🇹 Italian, 🇪🇸 Spanish, 🇵🇹 Portuguese, 🇩🇪 German, 🇻🇳 Vietnamese, 🇰🇷 Korean and 🇨🇳 Chinese.

Explore the Potential:
To learn more about this groundbreaking language model, visit the official website or explore the online demo platforms:
- Ghost 8B Beta (β, 8k) on Spaces: lamhieu/ghost-8b-beta-8k.
- Ghost 8B Beta (β, 128k) on Spaces: lamhieu/ghost-8b-beta-128k
- Official website: https://ghost-x.org/docs/models/ghost-8b-beta

44 replies

New activity in lmstudio-community/DeepSeek-Coder-V2-Lite-Instruct-GGUF 7 months ago

GGUF for the 236B model

#4 opened 7 months ago by

amarmir

New activity in open-llm-leaderboard/open_llm_leaderboard 7 months ago

WizardLM-8x22B Evaluation failed

#823 opened 7 months ago by

llama-anon