Deepseek V3
#93 opened 4 days ago
by
cybercyb
【Q】shared_head weights of MTP
#92 opened 7 days ago
by
huang11
fix for transformers 4.49 compatibility.
#91 opened 9 days ago
by
katuni4ka

Update README.md
#90 opened 9 days ago
by
baishihao
无辅助损失专家偏置代码实现的小问题 A Small Issue in the Code Implementation of Auxiliary-Loss-Free Load Balancing Expert Bias
#89 opened 14 days ago
by
liyang31163150
Fix generation with latest transformers
#88 opened 17 days ago
by
kylesayrs

Add pipeline tag
#86 opened 22 days ago
by
nielsr

Some of the safetensor files are not marked as safe
#85 opened 24 days ago
by
tanmaylaud

Update README.md
#84 opened 26 days ago
by
MTayira
ValueError: Must flatten tensors with uniform dtype but got torch.bfloat16 and torch.float8_e4m3fn
#82 opened 29 days ago
by
ajtakto
Update README.md
#81 opened about 1 month ago
by
FYSIOBASEN
Update README.md
#80 opened about 1 month ago
by
zhup
Update README.md
#79 opened about 1 month ago
by
zhup
chat
#77 opened about 1 month ago
by
rojithonline
DeepSeek-V3-lite naming conventions?
6
#76 opened about 1 month ago
by
AlphaGaO
torch.distributed.DistNetworkError
#75 opened about 1 month ago
by
yu19920006607

remove reference to deprecated transformers code
2
#74 opened about 1 month ago
by
winglian

Update README.md
#73 opened about 1 month ago
by
SamimSaikia
DeepSeek R1 answer ChatGPT ??
4
#72 opened about 1 month ago
by
valerebron

ValueError: Unrecognized configuration class <class 'transformers_modules.configuration_deepseek.DeepseekV3Config'> to build an AutoTokenizer.
10
#69 opened about 2 months ago
by
ajtakto
Paralelized script
#67 opened about 2 months ago
by
ajtakto
I am getting an error message while executing pip install - r requirements. txt
5
#64 opened about 2 months ago
by
yu19920006607

`aux_loss_alpha` should be 1e-4 instead of 1e-3?
#61 opened about 2 months ago
by
cuichenx

captcha not loading on edge
#60 opened 2 months ago
by
leo-smi
Upload shreya.zip
#59 opened 2 months ago
by
Msdthala
Upload IMG_20250111_184317.jpg
#58 opened 2 months ago
by
Sajalhero

无辅助损失的专家路由
2
#56 opened 2 months ago
by
qing9
AI Games
#55 opened 2 months ago
by
ChickenUJHAYIUSGU

Upload IMG_0509 4.HEIC
#54 opened 2 months ago
by
borhanrabbany

how to inference with mtp?
#53 opened 2 months ago
by
duanyu
Does it support ollama
2
#52 opened 2 months ago
by
sminbb
Create gngn
#49 opened 2 months ago
by
axingd
Missing tool call in system prompt
2
#48 opened 2 months ago
by
bchenfireworks
Update config.json
#47 opened 2 months ago
by
STATIKwitak

Rename figures/benchmark.png to figures/𓇋𓀀𓍿.png
#46 opened 2 months ago
by
STATIKwitak

Rename figures/benchmark.png to figures/𓇋𓀀𓍿.png
#45 opened 2 months ago
by
STATIKwitak

Upload IMG_0295.HEIC
#42 opened 2 months ago
by
Umarkhan499

vLLM on A100s
6
#41 opened 2 months ago
by
fsaudm
When do you plan to integrate Huggingface Transformer?
#40 opened 2 months ago
by
echo-yi
Deciphering messages
1
#39 opened 2 months ago
by
DoctorDonald
Update README.md
#38 opened 2 months ago
by
chaitanyayerroju
Update README.md
1
#37 opened 2 months ago
by
TomGrc
Training problem
3
#29 opened 2 months ago
by
DonGan13
Update README.md
1
#28 opened 2 months ago
by
Wisnet

Update README.md
2
#27 opened 2 months ago
by
Aikun7777777
Failed to run the model with 4 nodes of 8 4090
17
#25 opened 2 months ago
by
aisensiy
kill openai,come on
#24 opened 2 months ago
by
chaochaoli
Update modeling_deepseek.py
1
#23 opened 2 months ago
by
erichartford

is_torch_greater_or_equal_than_1_13 deprecated
#22 opened 2 months ago
by
erichartford
