7 15 64

Akash Singh

akashicmarga

AI & ML interests

Conversational AI

Recent Activity

liked a Space 4 days ago

HuggingFaceH4/blogpost-scaling-test-time-compute

liked a model 19 days ago

nvidia/Hymba-1.5B-Instruct

reacted to clem's post with 🚀 26 days ago

I've been in Brazil for 10 days now 🇧🇷🇧🇷🇧🇷 I've been surprised by the gap between the massive number of people interested in AI (chatgpt adoption is crazy here) and the relatively low number of real AI builders - aka people and companies building their own AI models, datasets and apps. Lots of efforts needed across the world for everyone to participate, control and benefit this foundational technology, starting with open-source & multi-lingual AI, more access to GPUs & AI builder training for all!

View all activity

Organizations

akashicmarga's activity

liked a Space 4 days ago

Running

327

📝

Scaling test-time compute

liked a model 19 days ago

nvidia/Hymba-1.5B-Instruct

Text Generation • Updated 3 days ago • 13.7k • 211

reacted to clem's post with 🚀 26 days ago

Post

1971

I've been in Brazil for 10 days now 🇧🇷🇧🇷🇧🇷

I've been surprised by the gap between the massive number of people interested in AI (chatgpt adoption is crazy here) and the relatively low number of real AI builders - aka people and companies building their own AI models, datasets and apps.

Lots of efforts needed across the world for everyone to participate, control and benefit this foundational technology, starting with open-source & multi-lingual AI, more access to GPUs & AI builder training for all!

reacted to maxiw's post with 👍 26 days ago

Post

2006

You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")

Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B

3 replies

reacted to Jaward's post with 👍 26 days ago

Post

1223

This is supercool!!
Explores o1-like multimodal reasoning.
Multi-agents with DPO is a nice touch 👍
Paper: https://arxiv.org/pdf/2411.14432
Code: https://github.com/dongyh20/Insight-V

liked a model about 1 month ago

Etched/oasis-500m

Updated Nov 4 • 464 • 431

upvoted 2 collections about 2 months ago

🪐 SmolLM

Collection

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 204

MobileLLM

Collection

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated 26 days ago • 99

upvoted a collection 3 months ago

Moshi v0.1 Release

Collection

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18 • 224

liked a dataset 3 months ago

argilla/FinePersonas-v0.1

Viewer • Updated 11 days ago • 42.1M • 6.11k • 366

liked a Space 5 months ago

Running

192

🎵

MusicGen Web

In-browser text-to-music w/ Transformers.js!

reacted to akhaliq's post with 👍 7 months ago

Post

20885

Chameleon

Mixed-Modal Early-Fusion Foundation Models

Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818)

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.

reacted to merve's post with 👍 7 months ago

Post

2867

I got asked about PaliGemma's document understanding capabilities, so I built a Space that has all the PaliGemma fine-tuned doc models 📄📊📖
merve/paligemma-doc

reacted to albertvillanova's post with 🚀 8 months ago

Post

1661

🚀 We recently released datasets 2.19.0! 📦

🔥 What's New:
- Polars integration 🐻‍❄️
- fsspec support for conversion to JSON, CSV, and Parquet
- Mode parameter for Image feature
- CLI function to convert script-datasets to Parquet
- Dataset.take and Dataset.skip

Plus, a bunch of general improvements & bug fixes!

Check out the release notes: https://github.com/huggingface/datasets/releases/tag/2.19.0

Upgrade now and power up your data workflows! 💥

2 replies

reacted to Jaward's post with 👍 8 months ago

Post

1793

mlx_micrograd - mlx port of Karpathy's micrograd- a tiny scalar-valued autograd engine with a small PyTorch-like neural network library on top.

https://github.com/Jaykef/mlx_micrograd
Installation

pip install mlx_micrograd

Example usage
Example showing a number of possible supported operations:

from mlx_micrograd.engine import Value

a = Value(-4.0)
b = Value(2.0)
c = a + b
d = a * b + b**3
c += c + 1
c += 1 + c + (-a)
d += d * 2 + (b + a).relu()
d += 3 * d + (b - a).relu()
e = c - d
f = e**2
g = f / 2.0
g += 10.0 / f
print(f'{g.data}') # prints array(24.7041, dtype=float32), the outcome of this forward pass
g.backward()
print(f'{a.grad}') # prints array(138.834, dtype=float32), i.e. the numerical value of dg/da
print(f'{b.grad}') # prints array(645.577, dtype=float32), i.e. the numerical value of dg/db

replied to werewolf5's post 8 months ago

what exactly you wanna do?

reacted to nateraw's post with 🔥 8 months ago

Post

3783

I just shared a blogpost on https://nateraw.com explaining the motivation + process of training nateraw/musicgen-songstarter-v0.2 - including training details, WandB logs, hparams, and notes on previous experiments.

Check it out here ⤵️
https://nateraw.com/posts/training_musicgen_songstarter.html

:) still kinda a WIP so if there's anything else you want to see, let me know.

3 replies

reacted to vikhyatk's post with 🔥 8 months ago

Post

3056

Updated the vikhyatk/lnqa dataset to include images, so you no longer need to separately download them from OpenImages!

reacted to akhaliq's post with 👍 8 months ago

Post

3514

CatLIP

CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data

CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data (2404.15653)

Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. However, pairwise similarity computation in contrastive loss between image and text pairs poses computational challenges. This paper presents a novel weakly supervised pre-training of vision models on web-scale image-text data. The proposed method reframes pre-training on image-text data as a classification task. Consequently, it eliminates the need for pairwise similarity computations in contrastive loss, achieving a remarkable 2.7times acceleration in training speed compared to contrastive learning on web-scale data. Through extensive experiments spanning diverse vision tasks, including detection and segmentation, we demonstrate that the proposed method maintains high representation quality.

liked a model 8 months ago

microsoft/Phi-3-mini-128k-instruct

Text Generation • Updated Aug 20 • 778k • • 1.62k