2 8 1

Taehyeon Kim

Kthyeon

https://taehyeon.oopy.io/

AI & ML interests

LLM Inference: Parallel, Speculative, Instructive Decoding

Recent Activity

new activity about 1 month ago

microsoft/Phi-3-small-8k-instruct:Getting the error: "triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 166912. Reducing block sizes or `num_stages` may help."

View all activity

Organizations

Kthyeon's activity

New activity in microsoft/Phi-3-small-8k-instruct about 1 month ago

Getting the error: "triton.runtime.autotuner.OutOfResources: out of resource: shared memory, Required: 180224, Hardware limit: 166912. Reducing block sizes or `num_stages` may help."

#27 opened 5 months ago by

Pranav0511

upvoted a paper 3 months ago

Phantom of Latent for Large Language and Vision Models

Paper • 2409.14713 • Published Sep 23 • 27

liked a model 5 months ago

facebook/multi-token-prediction

Updated Jun 18 • 349

upvoted a paper 6 months ago

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28 • 95

authored a paper 6 months ago

Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

Paper • 2406.16758 • Published Jun 24 • 19

upvoted 2 papers 6 months ago

Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts

Paper • 2404.09221 • Published Apr 14 • 1

Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

Paper • 2406.16758 • Published Jun 24 • 19

commented a paper 6 months ago

Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters

Paper • 2406.16758 • Published Jun 24 • 19 •

upvoted a paper 7 months ago

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4 • 37

authored a paper 7 months ago

Block Transformer: Global-to-Local Language Modeling for Fast Inference

Paper • 2406.02657 • Published Jun 4 • 37

upvoted a paper 7 months ago

Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models

Paper • 2405.15574 • Published May 24 • 53

authored a paper 8 months ago

Towards Fast Inference: Exploring and Improving Blockwise Parallel Drafts

Paper • 2404.09221 • Published Apr 14 • 1

updated a collection 8 months ago

speculative

Collection

0 items • Updated Apr 16

authored 2 papers 9 months ago

Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions

Paper • 2311.00233 • Published Nov 1, 2023 • 4

Navigating Data Heterogeneity in Federated Learning: A Semi-Supervised Approach for Object Detection

Paper • 2310.17097 • Published Oct 26, 2023 • 3

upvoted 2 papers about 1 year ago

Distort, Distract, Decode: Instruction-Tuned Model Can Refine its Response from Noisy Instructions

Paper • 2311.00233 • Published Nov 1, 2023 • 4

Navigating Data Heterogeneity in Federated Learning: A Semi-Supervised Approach for Object Detection

Paper • 2310.17097 • Published Oct 26, 2023 • 3