ThijsL202 (Thijs)

updated a collection about 22 hours ago

SPACES

Collection

2 items • Updated about 19 hours ago

liked a model 1 day ago

sophosympatheia/Evayale-v1.0

Text Generation • Updated 1 day ago • 19 • 1

liked 2 Spaces 11 days ago

Running on A100

223

🔀

redrix/AngelSlayer-12B-Unslop-Mell-RPMax-DARKNESS

Text Generation • Updated 15 days ago • 182 • 8

liked a model 18 days ago

win10/EVA-QwQ-32B-Preview

Text Generation • Updated Nov 29, 2024 • 67 • 5

New activity in win10/EVA-QwQ-32B-Preview 23 days ago

Thanks, great model.

#2 opened 23 days ago by

ThijsL202

reacted to Jaward's post with 🧠 about 1 month ago

Post

2418

Implements compute-efficient DeepPCR algorithm which parallelizes sequential operations thus speeding up inference and training of neural networks. DeepPCR can significantly reduce the time complexity in operations such as denoising in latent diffusion space from O(L) to O(log2 L).

Code: https://github.com/Jaykef/ai-algorithms/blob/main/deep_pcr.ipynb

reacted to abhishek's post with 🔥 about 2 months ago

Post

5504

INTRODUCING Hugging Face AutoTrain Client 🔥
Fine-tuning models got even easier!!!!
Now you can fine-tune SOTA models on all compatible dataset-model pairs on Hugging Face Hub using Python on Hugging Face Servers. Choose from a number of GPU flavors, millions of models and dataset pairs and 10+ tasks 🤗

To try, install autotrain-advanced using pip. You can ignore dependencies and install without --no-deps and then you'd need to install some dependencies by hand.

"pip install autotrain-advanced"

Github repo: https://github.com/huggingface/autotrain-advanced

6 replies

·

New activity in DavidAU/MN-Dark-Horror-The-Cliffhanger-18.5B-GGUF about 2 months ago

Cydonia?

1

#1 opened about 2 months ago by

ThijsL202

updated a model 2 months ago

ThijsL202/magnum-v4-27b-GGUF-0

Updated Oct 31, 2024 • 5

liked a model 2 months ago

bartowski/EVA-Qwen2.5-32B-v0.0-GGUF

Text Generation • Updated Oct 23, 2024 • 872 • 9

updated a collection 2 months ago

Helpfull

Collection

1 item • Updated Oct 23, 2024

liked a model 2 months ago

EVA-UNIT-01/EVA-Qwen2.5-32B-v0.0

Text Generation • Updated Oct 27, 2024 • 1.28k • 24

reacted to singhsidhukuldeep's post with 🔥 2 months ago

Post

1844

Good folks at @nvidia have released exciting new research on normalized Transformers (nGPT) for faster and more efficient language modeling!

Here is what they are proposing:

1. Remove all normalization layers, like RMSNorm or LayerNorm, from the standard Transformer architecture.

2. Normalize all matrices along their embedding dimension after each training step. This includes input and output embeddings, attention matrices (Q, K, V), output projection matrices, and MLP matrices.

3. Replace the standard residual connections with normalized update equations using learnable eigen learning rates for the attention and MLP blocks.

4. Change the softmax scaling factor in the attention mechanism from 1/sqrt of d_k to sqrt of d_k.

5. Implement rescaling and optional normalization of query (q) and key (k) vectors in the attention mechanism using learnable scaling factors.

6. Rescale the intermediate states of the MLP block using learnable scaling factors.

7. Implement rescaling of the output logits using learnable scaling factors.

8. Remove weight decay and learning rate warmup from the optimization process.

9. Initialize the eigen learning rates and scaling factors with appropriate values as specified in the paper.

10. During training, treat all vectors and matrices as residing on a unit hypersphere, interpreting matrix-vector multiplications as cosine similarities.

11. Implement the update equations for the hidden states using the normalized outputs from attention and MLP blocks, controlled by the eigen learning rates.

12. After each forward pass, normalize all parameter matrices to ensure they remain on the unit hypersphere.

13. Use the Adam optimizer without weight decay for training the model.

14. When computing loss, apply the learnable scaling factor to the logits before the softmax operation.

15. During inference, follow the same normalization and scaling procedures as in training.

Excited to see how it scales to larger models and datasets!