kiran's picture

kiran

kira

·

ki6an

AI & ML interests

agi

Recent Activity

updated a model 5 days ago

kira/pxl-tokenizer

published a model 5 days ago

kira/pxl-tokenizer

liked a Space 18 days ago

Writer/Financial_LLM_Performance_Leaderboard

View all activity

Organizations

kira's activity

upvoted a paper about 1 month ago

Sigma: Differential Rescaling of Query, Key and Value for Efficient Language Models

Paper • 2501.13629 • Published Jan 23 • 44

upvoted 3 collections 4 months ago

xLAM models

xLAM: A Family of Large Action Models to Empower AI Agent Systems: https://github.com/SalesforceAIResearch/xLAM • 11 items • Updated 20 days ago • 47

Qwen2.5-Coder

Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 292

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated 18 days ago • 245

upvoted 2 collections 8 months ago

Mini Pretrain Datasets

9 items • Updated Jul 9, 2024 • 9

Useful Pretrain-Datasets

pretrain-datasets with (maybe) good quality • 20 items • Updated Jun 12, 2024 • 1

upvoted a collection 10 months ago

Yi-1.5 (2024/05)

10 items • Updated May 20, 2024 • 92

upvoted a collection 11 months ago

GPT-4 generated datasets

Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs. • 18 items • Updated Apr 16, 2024 • 10

upvoted a paper 11 months ago

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Paper • 2404.08801 • Published Apr 12, 2024 • 67

upvoted 4 papers about 1 year ago

Tuning Language Models by Proxy

Paper • 2401.08565 • Published Jan 16, 2024 • 23

Extending LLMs' Context Window with 100 Samples

Paper • 2401.07004 • Published Jan 13, 2024 • 16

Scalable Pre-training of Large Autoregressive Image Models

Paper • 2401.08541 • Published Jan 16, 2024 • 38

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Paper • 2401.06951 • Published Jan 13, 2024 • 26

upvoted a collection about 1 year ago

Papers about model merging

referenced in the mergekit repo: https://github.com/cg123/mergekit • 4 items • Updated Feb 13, 2024 • 14

upvoted 6 papers over 1 year ago

CogVLM: Visual Expert for Pretrained Language Models

Paper • 2311.03079 • Published Nov 6, 2023 • 26

DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models

Paper • 2309.14509 • Published Sep 25, 2023 • 18

One Wide Feedforward is All You Need

Paper • 2309.01826 • Published Sep 4, 2023 • 33

SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference

Paper • 2307.02628 • Published Jul 5, 2023 • 10

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding

Paper • 2306.17107 • Published Jun 29, 2023 • 11

Extending Context Window of Large Language Models via Positional Interpolation

Paper • 2306.15595 • Published Jun 27, 2023 • 53