Ritwik Mishra's picture

3 2

Ritwik Mishra

ritwikm

·

https://midas.iiitd.edu.in/team/Ritwik-Mishra.html

AI & ML interests

None yet

Recent Activity

new activity 24 days ago

ritwikm/multilingual-bert-finetuned-xquad:Adding `safetensors` variant of this model

upvoted a paper about 1 month ago

Great Models Think Alike and this Undermines AI Oversight

replied to mmhamdy's post 5 months ago

💡 Thinking Tokens For Language Models! How much is 56 times 37? Can you answer that right away? In a short paper, David Herel and Tomas Mikolov propose a simple method to improve the reasoning of language models when performing complex calculations. 📌 They note that, although language models are not that good with difficult calculations, humans also cannot perform these calculations immediately and require a considerable amount of time to come up with an answer. Inspired by this, they introduce 💡Thinking Tokens💡 So what are those "thinking tokens"?! Nothing fancy, they are just special tokens '<T>' that you insert after each word in a sentence whenever a complex problem is encountered. That's it! 👉 The main idea is to "buy" the model "some time" to think about the problem with these additional computations before answering. Using this method they observed an improved (a little bit) perplexity. 👉 Before getting excited note that: They have added these tokens manually, and they have used an RNN language model. From the paper: "As a proof of concept, we have added N ’thinking tokens’ (< T >) after each observed word in a dataset. Our vision is that this basic concept can be extended to a self-adjusting model, which will be able to decide itself if and how many ’thinking tokens’ will be used for a specific problem, where N could also vary throughout the sentence. This would allow us to reduce the computational time, which would not increase N times."

View all activity

Organizations

ritwikm's activity

New activity in ritwikm/multilingual-bert-finetuned-xquad 24 days ago

Adding `safetensors` variant of this model

#1 opened 25 days ago by

upvoted a paper about 1 month ago

Great Models Think Alike and this Undermines AI Oversight

Paper • 2502.04313 • Published Feb 6 • 31

replied to mmhamdy's post 5 months ago

The paper judges the effectiveness of this approach only through perplexity. The concept of perplexity is basically, "how perplexed (surprised) your language model is when predicting a token". If a language model generates words at random then perplexity will be very high. However, if the LM is confident about a small set of words to be generated then perplexity will be low. So adding a predefined fixed token after each token will obviously make the LM more confident about the next word. So obviously perplexity will be low. Isn't it?

New activity in anon8231489123/vicuna-13b-GPTQ-4bit-128g 12 months ago

I keep getting this: ImportError: Using `load_in_8bit=True` requires Accelerate

#11 opened almost 2 years ago by

upvoted a paper about 1 year ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 186

New activity in ritwikm/gandhi-gpt almost 2 years ago

Adding `safetensors` variant of this model

#1 opened almost 2 years ago by

updated a model almost 2 years ago

ritwikm/gandhi-gpt

Text Generation • Updated Mar 19, 2023 • 209

updated 3 models about 2 years ago

ritwikm/bert-finetuned-squad

Question Answering • Updated Jan 9, 2023 • 15

ritwikm/multilingual-bert-finetuned-xquad

Question Answering • Updated 24 days ago • 144

ritwikm/multilingual-bert-finetuned-xquad-accelerate

Updated Dec 14, 2022