New discussion

About Quantized Models

#14 opened about 2 months ago by infgrad

flash attention

#21 opened 26 days ago by Disassemblern

Model loading size on GPU

#20 opened about 1 month ago by divrajnd

MRL and linear layers

1
#19 opened about 2 months ago by bobox

Can it output sparse vector?

1
#18 opened about 2 months ago by kk3dmax

Does this model only work on GPU?

1
#16 opened about 2 months ago by xPurity

Error when loading model KeyError: 'qwen2'

1
#11 opened about 2 months ago by longluu

Any multi-lingual variant

1
#10 opened about 2 months ago by prophet123

Can we have it in GGUF F16/32?

2
#9 opened about 2 months ago by qdrddr

Parameters for peak performances

3
#8 opened about 2 months ago by cvdbdo

Model max_seq_length

6
#6 opened 2 months ago by shuyuej

Fix prompt_name typo

1
#4 opened 2 months ago by mber

Upload ONNX weights

1
#3 opened 2 months ago by Xenova