George Duan PRO

cctuan

cctuan

AI & ML interests

None yet

Recent Activity

liked a Space 18 days ago

fishaudio/fish-speech-1

liked a Space about 2 months ago

huggingface/open-source-ai-year-in-review-2024

liked a Space 2 months ago

multimodalart/flux-lora-lab

View all activity

Organizations

cctuan's activity

liked a Space 18 days ago

443

Fish Speech 1

🏆

Generate speech from text

liked a Space about 2 months ago

528

Open Source Ai Year In Review 2024

😻

What happened in open-source AI this year, and what’s next?

liked a Space 2 months ago

459

FLUX LoRa Lab

🧪

Generate images using selected LoRAs and prompts

updated 2 models 2 months ago

cctuan/gys1217

Text-to-Image • Updated Dec 17, 2024 •

cctuan/gys25

Text-to-Image • Updated Dec 15, 2024 • 2 •

liked a Space 2 months ago

250

MaskGCT TTS Demo

😻

MaskGCT TTS Demo

reacted to m-ric's post with ❤️ 3 months ago

Post

2393

Single most important thing to do today: 𝗴𝗼 𝘁𝗿𝘆 𝗤𝘄𝗤 𝗼𝗻 𝗛𝘂𝗴𝗴𝗶𝗻𝗴 𝗖𝗵𝗮𝘁!

👉 https://huggingface.co/chat/models/Qwen/QwQ-32B-Preview

2 replies

reacted to davanstrien's post with ❤️ 3 months ago

Post

2503

First dataset for the new Hugging Face Bluesky community organisation: https://huggingface.co/datasets/bluesky-community/one-million-bluesky-posts 🦋

📊 1M public posts from Bluesky's firehose API
🔍 Includes text, metadata, and language predictions
🔬 Perfect to experiment with using ML for Bluesky 🤗

Excited to see people build more open tools for a more open social media platform!

reacted to maxiw's post with 👍 3 months ago

Post

2223

You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")

Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B

3 replies

liked 2 Spaces 3 months ago

206

ACE-Chat

🪄

(Tongyi Lab) ACE: All-round Creator and Editor

JoyType

🔥

liked a Space 4 months ago

334

TANGO

🐠

Co-Speech Gesture Video Generation

reacted to singhsidhukuldeep's post with 👀 4 months ago

Post

2164

While Google's Transformer might have introduced "Attention is all you need," Microsoft and Tsinghua University are here with the DIFF Transformer, stating, "Sparse-Attention is all you need."

The DIFF Transformer outperforms traditional Transformers in scaling properties, requiring only about 65% of the model size or training tokens to achieve comparable performance.

The secret sauce? A differential attention mechanism that amplifies focus on relevant context while canceling out noise, leading to sparser and more effective attention patterns.

How?
- It uses two separate softmax attention maps and subtracts them.
- It employs a learnable scalar λ for balancing the attention maps.
- It implements GroupNorm for each attention head independently.
- It is compatible with FlashAttention for efficient computation.

What do you get?
- Superior long-context modeling (up to 64K tokens).
- Enhanced key information retrieval.
- Reduced hallucination in question-answering and summarization tasks.
- More robust in-context learning, less affected by prompt order.
- Mitigation of activation outliers, opening doors for efficient quantization.

Extensive experiments show DIFF Transformer's advantages across various tasks and model sizes, from 830M to 13.1B parameters.

This innovative architecture could be a game-changer for the next generation of LLMs. What are your thoughts on DIFF Transformer's potential impact?