Qwen-7B-Chat / modeling_qwen.py

Commit History

fix sampling in chat stream
13750ae

JustinLin610 commited on

fix slow long sequence inference
e326b6c

yangapku commited on

Update modeling_qwen.py
d25b5f8

JustinLin610 commited on

update tokenization and readme
acdaf68

yangapku commited on

deprecate argument stream in model.chat()
ff3a904

yangapku commited on

update support for flash attn
e3edce3

yangapku commited on

fix chat streaming
2db302e

yangapku commited on

add support for flash attn 2
50ea631

yangapku commited on

fix kwargs in generate method and update readme
193987f

yangapku commited on

update config and streaming generation
04df5dd

yangapku commited on

update config about model precision, fix apply_rotary_pos_emb
26fad65

yangapku commited on

update readme and fix convert_tokens_to_string
53c9efa

yangapku commited on

support cpu inference, format file (#9)
cbf815e

JustinLin610 commited on

Update modeling_qwen.py, fix logn bug
f157e4e

logicwong commited on

fix flash-attention usage
405556d

yangapku commited on

add resource files
4658aaa

yangapku commited on