Qwen-7B-Chat / modeling_qwen.py

Commit History

fix sampling in chat stream

13750ae

JustinLin610 commited on Aug 8, 2023

fix slow long sequence inference

e326b6c

yangapku commited on Aug 8, 2023

Update modeling_qwen.py

d25b5f8

JustinLin610 commited on Aug 8, 2023

update tokenization and readme

acdaf68

yangapku commited on Aug 8, 2023

deprecate argument stream in model.chat()

ff3a904

yangapku commited on Aug 8, 2023

update support for flash attn

e3edce3

yangapku commited on Aug 7, 2023

fix chat streaming

2db302e

yangapku commited on Aug 7, 2023

add support for flash attn 2

50ea631

yangapku commited on Aug 7, 2023

fix kwargs in generate method and update readme

193987f

yangapku commited on Aug 5, 2023

update config and streaming generation

04df5dd

yangapku commited on Aug 4, 2023

update config about model precision, fix apply_rotary_pos_emb

26fad65

yangapku commited on Aug 4, 2023

update readme and fix convert_tokens_to_string

53c9efa

yangapku commited on Aug 4, 2023

support cpu inference, format file (#9)

cbf815e

JustinLin610 commited on Aug 4, 2023

Update modeling_qwen.py, fix logn bug

f157e4e

logicwong commited on Aug 4, 2023

fix flash-attention usage

405556d

yangapku commited on Aug 3, 2023

add resource files

4658aaa

yangapku commited on Aug 3, 2023

Commit History

fix sampling in chat stream 13750ae

fix slow long sequence inference e326b6c

Update modeling_qwen.py d25b5f8

update tokenization and readme acdaf68

deprecate argument stream in model.chat() ff3a904

update support for flash attn e3edce3

fix chat streaming 2db302e

add support for flash attn 2 50ea631

fix kwargs in generate method and update readme 193987f

update config and streaming generation 04df5dd

update config about model precision, fix apply_rotary_pos_emb 26fad65

update readme and fix convert_tokens_to_string 53c9efa

support cpu inference, format file (#9) cbf815e

Update modeling_qwen.py, fix logn bug f157e4e

fix flash-attention usage 405556d

add resource files 4658aaa

fix sampling in chat stream

13750ae

fix slow long sequence inference

e326b6c

Update modeling_qwen.py

d25b5f8

update tokenization and readme

acdaf68

deprecate argument stream in model.chat()

ff3a904

update support for flash attn

e3edce3

fix chat streaming

2db302e

add support for flash attn 2

50ea631

fix kwargs in generate method and update readme

193987f

update config and streaming generation

04df5dd

update config about model precision, fix apply_rotary_pos_emb

26fad65

update readme and fix convert_tokens_to_string

53c9efa

support cpu inference, format file (#9)

cbf815e

Update modeling_qwen.py, fix logn bug

f157e4e

fix flash-attention usage

405556d

add resource files

4658aaa