Jaward Sesay

Jaward

AI & ML interests

I like to train large deep neural nets too 🧠🤖💥 | First Paper (AutoAgents: A Framework for Automatic Agent Generation) Accepted @ IJCAI 2024 | Role Model Karpathy

Recent Activity

posted an update about 2 hours ago

"the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies", deepseek researchers are so based🔥 They had an “aha moment”, a key takeaway from this is to always try out new ideas from first-principles. Paper: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf Code: https://github.com/deepseek-ai/DeepSeek-R1 Weights: https://huggingface.co/collections/deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d

upvoted a paper about 23 hours ago

Evolving Deeper LLM Thinking

reacted to mlabonne's post with 🧠 5 days ago

🆕 LLM Course 2025 edition! I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling. The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers. I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap. Thanks everyone, hope you'll enjoy it! 💻 LLM Course: https://huggingface.co/blog/mlabonne/llm-course

View all activity

Articles

In Honour of This Year's NeurIPs Test of Time Paper Awardees

Dec 10, 2024

• 2

Rethinking Backpropagation: Thoughts on What's Wrong with Backpropagation

Dec 2, 2024

• 5

Journey With Me Into The Mind of Large Language Models: Interesting Findings in AnthropicAI's Scaling Monosemanticity paper.

May 22, 2024

• 2

On Coding Your First Attention

Apr 21, 2024

• 7

Organizations

Jaward's activity

posted an update about 2 hours ago

Post

"the power and beauty of reinforcement learning: rather than explicitly teaching the model on how to solve a problem, we simply provide it with the right incentives, and it autonomously develops advanced problem-solving strategies", deepseek researchers are so based🔥
They had an “aha moment”, a key takeaway from this is to always try out new ideas from first-principles.

Paper: https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf
Code: https://github.com/deepseek-ai/DeepSeek-R1
Weights: deepseek-ai/deepseek-r1-678e1e131c0169c0bc89728d

reacted to mlabonne's post with 🧠 5 days ago

Post

2726

🆕 LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

💻 LLM Course: https://huggingface.co/blog/mlabonne/llm-course

posted an update 7 days ago

Post

1823

minimal single script implementation of knowledge distillation in LLMs. In this implementation, we use GPT-2 (124M) as student model and GPT-2 Medium (340M) as teacher via reverse Kullback-Leibler (KL) divergence, trained on a small chunk of openwebtext.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/llm_knowledge_distillation.ipynb

posted an update 12 days ago

Post

1345

Huge AI win in medicine👏
"Large language of life model" just dropped!!
Full paper: https://www.nature.com/articles/s41586-024-08391-z

1 reply

posted an update 14 days ago

Post

2303

damn I love nvidia's bullish stance on taking AI to the edge - from being the overlord of compute to cutting-edge physical AI with SOTA multiverse simulation engines that brings the scaling laws under your control!!

My favorite: Cosmos - fully opensourced, open-weight physics based video gen platform, what an incredible way to start off the year✨

Code: https://github.com/NVIDIA/Cosmos
Models: nvidia/cosmos-6751e884dc10e013a0a0d8e6
Paper: https://d1qx31qr3h6wln.cloudfront.net/publications/NVIDIA%20Cosmos_2.pdf

posted an update 24 days ago

Post

2995

nanoBLT: Simplified lightweight implementation of a character-level Byte Latent Transformer model (under 500 lines of code). The model is 2x4x2 (n_layers_encoder, n_layers_latent, n_layers_decoder) layer deep trained on ~1M bytes of tiny Shakespeare with a patch size of 4.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/byte_latent_transformer.ipynb

replied to their post about 1 month ago

btw the background songs in the videos are actually what I listen to during implementation

posted an update about 1 month ago

Post

1800

Implements from first-principle a discrete flow matching model for code generation- trained a small sized 2D dfm model on two variations of code for binary search. The result was amazing, code in comment:
Code: https://github.com/Jaykef/ai-algorithms/blob/main/dfm.ipynb

1 reply

posted an update about 1 month ago

Post

603

In Honour of This Year's NeurIPs Test of Time Paper Awardees
This year's NIPs Test of Time Paper Awards went to two groundbreaking papers:
1. Generative Adversarial Nets (Goodfellow et al)
2. Sequence to Sequence Learning with Neural Networks (Ilya et al)
Let's explore how these papers helped pioneered breakthroughs in today's AI:

Full Article: https://huggingface.co/blog/Jaward/nip

posted an update about 1 month ago

Post

646

Lightweight implementation of the seminal paper “Sequence to Sequence Learning with Neural Networks”

Built, trained and eval a 2 layer deep seq2seq LSTM-based model (~10M params) on German-English corpus of Multi30K dataset. In honor of
ilya sutskever et al for winning this year’s NeurIPSConf Test of Time paper award 🫡

Code: https://github.com/Jaykef/ai-algorithms/blob/main/seq2seq.ipynb

posted an update about 2 months ago

Post

488

Rethinking Backpropagation: Thoughts on What's Wrong with Backpropagation

As a young researcher, I've often pondered the limitations of backpropagation, especially when mapped with how learning occurs in the human brain. While backpropagation has been the workhorse of deep learning, it isn't without flaws. In this post, I aim to share some thoughts on these shortcomings from first principles.

Full article
https://huggingface.co/blog/Jaward/rethinking-backpropagation

posted an update about 2 months ago

Post

2429

Implements compute-efficient DeepPCR algorithm which parallelizes sequential operations thus speeding up inference and training of neural networks. DeepPCR can significantly reduce the time complexity in operations such as denoising in latent diffusion space from O(L) to O(log2 L).

Code: https://github.com/Jaykef/ai-algorithms/blob/main/deep_pcr.ipynb

posted an update about 2 months ago

Post

1241

This is supercool!!
Explores o1-like multimodal reasoning.
Multi-agents with DPO is a nice touch 👍
Paper: https://arxiv.org/pdf/2411.14432
Code: https://github.com/dongyh20/Insight-V

posted an update 2 months ago

Post

1630

Ok RNNs can rap too:)

Here we implement the seminal RNN paper “Generating Text with Recurrent Neural Networks"- we train a character-level multiplicative recurrent neural network model (~250k params) for 1000 epochs with Adam opt on 2pac's "Hit 'em Up", sample was fun lol.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/generating_texts_with_rnns.ipynb

posted an update 2 months ago

Post

1743

Interesting Work on Reasoning 🤔
- explores a new take on few-shot reasoning while challenging assumptions that program synthesis is necessary for abstract reasoning.
- shows test-time training + smart inference tricks can match human-average performance, though at high computational cost. Key insight: proper compute allocation matters more than method (whether symbolic or neural).

Paper: https://ekinakyurek.github.io/papers/ttt.pdf

posted an update 3 months ago

Post

2111

It's work like this that in some way signal the eventual “dominance” of AI over all the sciences.

“We train our model on the six-dimensional N-body phase space, predicting particle velocities as the time derivative of the model’s displacement outputs”

The emulator is capable of predicting
the nonlinear displacement and velocity fields for 128^3 particles in half a second on a single GPU🤯

1 reply

posted an update 3 months ago

Post

1753

Triton nanoGPT now has a custom cross entropy loss kernel 🚀
Next: matmul, gradually overthrowing all major PyTorch ops:)

Simplified pseudo for parallel cross-entropy loss compute:
- init program: get pid, compute offsets, load targets.
- init row_max and row_sum.
- for-loop1 (find max logits): update row_max with max logits.
- for-loop2 (compute softmax and loss): compute row_sum, update loss.
- add log(row_sum) and store loss.

Code: https://github.com/Jaykef/ai-algorithms/blob/main/triton_nanoGPT.ipynb

posted an update 3 months ago

Post

412

This has to be the first peak performance level use case of a non-autoregressive architecture for TTS. Flow matching for the win!!

Demo: mrfakename/E2-F5-TTS
Model: SWivid/E2-TTS

posted an update 3 months ago

Post

1168

Lightweight implementation of newly introduced “Differential Transformer”:
Proposes differential attention mechanism which computes attention scores as a difference between two separate softmax attention maps thereby reducing noise in attention blocks. [[[Differential nanoGPT]]] :)

Code: https://github.com/Jaykef/ai-algorithms/blob/main/DIFF_Transformer.ipynb
YT Video: https://youtu.be/9V4mJA5y7dg

reacted to clem's post with 👍 3 months ago

Post

4166

Open-source AI creates healthy competition in a field where natural tendencies lead to extreme concentration of power. Imagine a world where only one or two companies could build software. This is the biggest risk and ethical challenge of them all IMO. Let's fight this!

3 replies