27 5 359

南栖

Minami-su

Minami-su

AI & ML interests

NLP，MultiModal，Human intelligence,Autonomous Cognitive,Self-instruction generation, enhanced instruction

Recent Activity

liked a model 7 days ago

Infinigence/Megrez-3B-Omni

liked a dataset 13 days ago

mlabonne/orpo-dpo-mix-40k

liked a dataset 13 days ago

LDJnr/Capybara

View all activity

Organizations

Minami-su's activity

New activity in huggingface/HuggingDiscussions 4 months ago

[FEEDBACK] Daily Papers

102

#32 opened 6 months ago by

kramp

New activity in Minami-su/Qwen1.5-0.5B-Chat_llamafy 6 months ago

Could you convert also Qwen/Qwen2-0.5B-Instruct?

#1 opened 6 months ago by

Felladrin

New activity in Minami-su/Qwen1.5-32B-Chat-quip-3bit 6 months ago

这模型可以用 vllm-gptq 这个分支部署推理嘛？

#1 opened 6 months ago by

lich60132

New activity in deepseek-ai/DeepSeek-V2-Chat 7 months ago

MoE offloading strategy？

#8 opened 7 months ago by

Minami-su

New activity in Minami-su/IA_14B 9 months ago

Update README.md

#1 opened 9 months ago by

Minami-su

commented a paper 10 months ago

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 183 •

New activity in Minami-su/Qwen1.5-7B-Chat_mistral 10 months ago

Adding Evaluation Results

#1 opened 10 months ago by

leaderboard-pr-bot

Adding Evaluation Results

#2 opened 10 months ago by

Minami-su

New activity in Minami-su/Qwen1.5-7B-Chat_llamafy 10 months ago

Adding Evaluation Results

#3 opened 10 months ago by

Minami-su

GGUF Creation from Llamafy

#1 opened 10 months ago by

RonanMcGovern

commented 2 papers 10 months ago

Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping

Paper • 2402.14083 • Published Feb 21 • 47 •

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 95 •

New activity in OrionStarAI/Orion-14B-Chat 11 months ago

some text are not renamed to Orion

#4 opened 11 months ago by

J22

llama rename？

#3 opened 11 months ago by

Minami-su

New activity in cloudyu/Mixtral_34Bx2_MoE_60B 12 months ago

source code and paper?

#6 opened 12 months ago by

josephykwang

New activity in KnutJaegersberg/Tess-M-34B-2bit 12 months ago

Re-Quantize Model

#1 opened 12 months ago by

igoforth

New activity in Minami-su/SUS-Chat-34B_2bit 12 months ago

Re-Quantize?

#2 opened 12 months ago by

igoforth

Hessian context length?

#1 opened about 1 year ago by

KnutJaegersberg

New activity in Minami-su/Yi_34B_Chat_2bit about 1 year ago

Hessians?

#2 opened about 1 year ago by

somehumanperson1

Chinese token capabilities?

#1 opened about 1 year ago by

at676