Mayank Mishra's picture

Mayank Mishra

mayank-mishra

·

https://mayank31398.github.io/

AI & ML interests

Large Language Models, Distributed Training and Inference

Recent Activity

upvoted a collection 11 days ago

Granite 3.1 Language Models

new activity 11 days ago

ibm-granite/granite-3.1-8b-instruct:Exceptional creative writer

authored a paper 12 days ago

Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

View all activity

Articles

Improving Hugging Face Training Efficiency Through Packing with Flash Attention

Saving Memory Using Padding-Free Transformer Layers during Finetuning

Aurora-M: The First Open Source Biden-Harris Executive Order Red teamed Multilingual Language Model

Organizations

Posts 4

Post

1784

New preprint out with colleagues from MIT and IBM Research

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention (2405.12981)

We introduce a simple mechanism of sharing keys and values across layers, reducing the memory needed for KV cache during inference!!

Post

2565

Thrilled to unveil DS-MoE: a dense training and sparse inference scheme for enhanced computational and memory efficiency in your MoE models! 🚀🚀🚀

Discover more in our blog: https://huggingface.co/blog/bpan/ds-moe and dive into the details with our paper: Dense Training, Sparse Inference: Rethinking Training of Mixture-of-Experts Language Models (2404.05567)

Collections 1

Papers 20

arxiv:2409.04787

arxiv:2408.13359

arxiv:2407.13739

arxiv:2407.09105

models 8

mayank-mishra/granite-3b-code-glaive-20k

Text Generation • Updated Jun 5 • 28

mayank-mishra/granite-20b-code-instruct-Q4_K_M-GGUF

Text Generation • Updated May 19

mayank-mishra/starcoder-GPTQ-8bit-128g

Updated May 5, 2023 • 11

mayank-mishra/starcoder-GPTQ-4bit-128g

Updated May 5, 2023 • 16

mayank-mishra/starcoderbase-GPTQ-4bit-128g

Updated May 5, 2023 • 21

mayank-mishra/starcoderbase-GPTQ-8bit-128g

Updated May 5, 2023 • 3

mayank-mishra/santacoder-GPTQ-4bit-128g

Updated May 4, 2023 • 2

mayank-mishra/santacoder-GPTQ-8bit-128g

Updated May 4, 2023 • 1

datasets 1

mayank-mishra/glaive-code-assisstant-v3-20k

Viewer • Updated Jun 5 • 20k • 59