Shamane (Siri)

updated 2 datasets 2 months ago

arcee-train/rl-instruction-filtered

Viewer • Updated Sep 26 • 4.14k • 33

arcee-train/rl-instruction

Viewer • Updated Sep 26 • 7.5k • 35

upvoted a paper 2 months ago

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19 • 135

updated a model 3 months ago

arcee-train/shamane-9-12-untrained-merge

Text Generation • Updated Sep 12 • 4

updated 2 datasets 3 months ago

arcee-train/no-base-combined-dataset

Viewer • Updated Sep 12 • 1.73k • 35

arcee-train/9-2-combined-dataset

Viewer • Updated Sep 11 • 1.73k • 36

updated a model 3 months ago

arcee-train/untrained-merged-random-coeffs

Text Generation • Updated Sep 10 • 5

updated a dataset 3 months ago

arcee-train/my-combined-dataset

Viewer • Updated Sep 10 • 1.73k • 36

updated 2 models 3 months ago

arcee-train/pplist-merged-untrained-with-base-layernorm-embedding

Text Generation • Updated Sep 10 • 8

arcee-train/pplist-merged-untrained-with-base

Text Generation • Updated Sep 5 • 387

updated a dataset 3 months ago

arcee-train/logits-dataset-full-set-top-50

Viewer • Updated Aug 30 • 1.73k • 36 • 1

updated a model 5 months ago

arcee-ai/Llama-3-SEC-Chat

Text Generation • Updated Jun 20 • 59 • 34

liked a model 5 months ago

arcee-ai/Llama-3-SEC-Chat

Text Generation • Updated Jun 20 • 59 • 34

updated a model 6 months ago

arcee-ai/cpt-16B-auto-sft-ties-post-merge-auto-dpo

Text Generation • Updated Jun 10 • 8

updated 2 models 7 months ago

arcee-ai/Mixtral-8x7B-Instruct-v0.1-Finance

Updated May 16

arcee-ai/teeny-tiny-mixtral

Text Generation • Updated Apr 23 • 15 • 1

liked a model 7 months ago

chargoddard/llama3-42b-v0

Text Generation • Updated Apr 24 • 445 • 117

New activity in jetmoe/jetmoe-8b 8 months ago

When can we have the training code as illustrated in the paper.

12

#5 opened 8 months ago by

Shamane

New activity in jetmoe/jetmoe-8b-chat 8 months ago

Seems like still we can't load this model with the Transformers library?

2

#2 opened 8 months ago by

Shamane

New activity in arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer 8 months ago

Why is the size of pruned model bigger than the original ones after 24 layers been sliced?

4

#1 opened 8 months ago by

iheardyoulooking

Siri

AI & ML interests

Organizations

Shamane's activity

arcee-train/rl-instruction-filtered

arcee-train/rl-instruction

Training Language Models to Self-Correct via Reinforcement Learning

arcee-train/shamane-9-12-untrained-merge

arcee-train/no-base-combined-dataset

arcee-train/9-2-combined-dataset

arcee-train/untrained-merged-random-coeffs

arcee-train/my-combined-dataset

arcee-train/pplist-merged-untrained-with-base-layernorm-embedding

arcee-train/pplist-merged-untrained-with-base

arcee-train/logits-dataset-full-set-top-50

arcee-ai/Llama-3-SEC-Chat

arcee-ai/Llama-3-SEC-Chat

arcee-ai/cpt-16B-auto-sft-ties-post-merge-auto-dpo

arcee-ai/Mixtral-8x7B-Instruct-v0.1-Finance

arcee-ai/teeny-tiny-mixtral

chargoddard/llama3-42b-v0

When can we have the training code as illustrated in the paper.

Seems like still we can't load this model with the Transformers library?

Why is the size of pruned model bigger than the original ones after 24 layers been sliced?