Donkey Small

DonkeySmall

AI & ML interests

None yet

Recent Activity

reacted to schuler's post with ❤️ about 6 hours ago

liked a Space 2 months ago

ZhengPeng7/BiRefNet_demo

reacted to lorraine2's post with 👍 2 months ago

New NVIDIA paper: ⚡ Multi-student Diffusion Distillation for Better One-step Generators ⚡ Do you want to make your diffusion models (a) run in a single step, (b) run with a smaller model, and (c) have improved quality simultaneously? Check out our multi-student distillation (MSD) method, which is simple and applicable to most diffusion models! The only catch is now we have to distill (and store) a mixture-of-expert student generators. Explore the MSD project page to learn more: https://research.nvidia.com/labs/toronto-ai/MSD/ Work led by Yanke Song along with Weili Nie, Karsten Kreis and James Lucas Check out more work from the Toronto AI Lab here: https://research.nvidia.com/labs/toronto-ai/

View all activity

Organizations

DonkeySmall's activity

reacted to schuler's post with ❤️ about 6 hours ago

Post

1803

┌─────────────────────────┐
│     Input Layer       │
├─────────────────────────┤
│ Token & Positional    │
│     Embedding         │
├─────────────────────────┤
│   12x Transformer     │
│      Blocks           │
│  - 12 heads           │
│  - 768 hidden dims    │
│  - 3072 intermediate  │
├─────────────────────────┤
│   Output Layer        │
└─────────────────────────┘

Clean Pascal Implementation

for CntLayer := 1 to {Layers=}12 do
begin
  Result.AddTransformerBlockCAI(
    {Heads=}12, 
    {intermediate dimensions=}4*768, 
    {NoForward=}true, 
    {HasNorm=}true, 
    false
  );
end;

liked a Space 2 months ago

194

BiRefNet Demo

🐠

Extract and mask subjects from images

reacted to lorraine2's post with 👍 2 months ago

Post

1211

New NVIDIA paper: ⚡ Multi-student Diffusion Distillation for Better One-step Generators ⚡

Do you want to make your diffusion models (a) run in a single step, (b) run with a smaller model, and (c) have improved quality simultaneously? Check out our multi-student distillation (MSD) method, which is simple and applicable to most diffusion models! The only catch is now we have to distill (and store) a mixture-of-expert student generators.

Explore the MSD project page to learn more: https://research.nvidia.com/labs/toronto-ai/MSD/

Work led by Yanke Song along with Weili Nie, Karsten Kreis and James Lucas

Check out more work from the Toronto AI Lab here: https://research.nvidia.com/labs/toronto-ai/

1 reply

reacted to rwightman's post with 🚀 3 months ago

Post

1600

New MobileNetV4 weights were uploaded a few days ago -- more ImageNet-12k training at 384x384 for the speedy 'Conv Medium' models.

There are 3 weight variants here for those who like to tinker. On my hold-out eval they are ordered as below, not that different, but the Adopt 180 epochs closer to AdamW 250 than to AdamW 180.
* AdamW for 250 epochs - timm/mobilenetv4_conv_medium.e250_r384_in12k
* Adopt for 180 epochs - timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
* AdamW for 180 epochs - timm/mobilenetv4_conv_medium.e180_r384_in12k

This was by request as a user reported impressive results using the 'Conv Large' ImagNet-12k pretrains as object detection backbones. ImageNet-1k fine-tunes are pending, the weights do behave differently with the 180 vs 250 epochs and the Adopt vs AdamW optimizer.