Donkey Small

DonkeySmall

AI & ML interests

None yet

Recent Activity

reacted to schuler's post with โค๏ธ about 6 hours ago
๐Ÿ”ฎ GPT-3 implemented in pure Free Pascal! https://github.com/joaopauloschuler/gpt-3-for-pascal This implementation follows the GPT-3 Small architecture from the landmark paper "Language Models are Few-Shot Learners": ``` โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ Input Layer โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ Token & Positional โ”‚ โ”‚ Embedding โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ 12x Transformer โ”‚ โ”‚ Blocks โ”‚ โ”‚ - 12 heads โ”‚ โ”‚ - 768 hidden dims โ”‚ โ”‚ - 3072 intermediate โ”‚ โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚ Output Layer โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ ``` Clean Pascal Implementation ``` for CntLayer := 1 to {Layers=}12 do begin Result.AddTransformerBlockCAI( {Heads=}12, {intermediate dimensions=}4*768, {NoForward=}true, {HasNorm=}true, false ); end; ```
liked a Space 2 months ago
ZhengPeng7/BiRefNet_demo
View all activity

Organizations

OCR-Datasets's profile picture

DonkeySmall's activity

reacted to schuler's post with โค๏ธ about 6 hours ago
view post
Post
1803
๐Ÿ”ฎ GPT-3 implemented in pure Free Pascal!
https://github.com/joaopauloschuler/gpt-3-for-pascal

This implementation follows the GPT-3 Small architecture from the landmark paper "Language Models are Few-Shot Learners":
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚     Input Layer       โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Token & Positional    โ”‚
โ”‚     Embedding         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚   12x Transformer     โ”‚
โ”‚      Blocks           โ”‚
โ”‚  - 12 heads           โ”‚
โ”‚  - 768 hidden dims    โ”‚
โ”‚  - 3072 intermediate  โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚   Output Layer        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Clean Pascal Implementation
for CntLayer := 1 to {Layers=}12 do
begin
  Result.AddTransformerBlockCAI(
    {Heads=}12, 
    {intermediate dimensions=}4*768, 
    {NoForward=}true, 
    {HasNorm=}true, 
    false
  );
end;

reacted to lorraine2's post with ๐Ÿ‘ 2 months ago
view post
Post
1211
New NVIDIA paper: โšก Multi-student Diffusion Distillation for Better One-step Generators โšก

Do you want to make your diffusion models (a) run in a single step, (b) run with a smaller model, and (c) have improved quality simultaneously? Check out our multi-student distillation (MSD) method, which is simple and applicable to most diffusion models! The only catch is now we have to distill (and store) a mixture-of-expert student generators.

Explore the MSD project page to learn more: https://research.nvidia.com/labs/toronto-ai/MSD/

Work led by Yanke Song along with Weili Nie, Karsten Kreis and James Lucas

Check out more work from the Toronto AI Lab here: https://research.nvidia.com/labs/toronto-ai/
  • 1 reply
ยท
reacted to rwightman's post with ๐Ÿš€ 3 months ago
view post
Post
1600
New MobileNetV4 weights were uploaded a few days ago -- more ImageNet-12k training at 384x384 for the speedy 'Conv Medium' models.

There are 3 weight variants here for those who like to tinker. On my hold-out eval they are ordered as below, not that different, but the Adopt 180 epochs closer to AdamW 250 than to AdamW 180.
* AdamW for 250 epochs - timm/mobilenetv4_conv_medium.e250_r384_in12k
* Adopt for 180 epochs - timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
* AdamW for 180 epochs - timm/mobilenetv4_conv_medium.e180_r384_in12k

This was by request as a user reported impressive results using the 'Conv Large' ImagNet-12k pretrains as object detection backbones. ImageNet-1k fine-tunes are pending, the weights do behave differently with the 180 vs 250 epochs and the Adopt vs AdamW optimizer.

reacted to gokaygokay's post with ๐Ÿ‘ 7 months ago