Is my understanding correct that the monkey patch will be needed to be added for inference only?

by flashvenom - opened Jun 22, 2023

Jun 22, 2023

ie. when I convert this model into GGML/GPTQ, I will need to make sure the inference engine is using this patch logic right?

kaiokendev

Owner Jun 22, 2023

Correct

TheBloke

Jun 22, 2023

•

edited Jun 23, 2023

Have you looked into applying this as a config.json patch activated with trust_remote_code=True, like how Landmark Attention is applied eg at https://huggingface.co/eugenepentland/Minotaur-13b-Landmark ?

Then Transformers could auto load rather than needing manual editing of inference code. That could make it a lot more accessible, if it's possible?

kaiokendev

Owner Jun 23, 2023

@TheBloke I will look into it and convert tomorrow if that is ok

TheBloke

Jun 23, 2023

That'd be wonderful! I think that will really help to get people using your model. I will provide a quantised GPTQ once that is done, and publicise your work.

Thank you

kaiokendev

Owner Jun 23, 2023

@TheBloke Since @emozilla has already added the code for trust_remote_code, can you take it from there? https://huggingface.co/emozilla/open_llama_7b-scaled
Since this is a LoRA I don't think it would benefit to put the code here, no? Only in the final merged model repository

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment