Thanks for this model!
thanks for this model, I spent the afternoon working with it.
I've proposed three minor updates to the README:
- Docker compatibility (I was able to confirm it works with 12.1)
- I hit an issue because I build on a different machine then I run on that it tried to target CUDA versions the code doesn't support, being specific about which CUDA versions we target fixes that
- The config for this model hasn't actually got a tokenizer defined, it just has the same name.
I've made one other change locally that I did not include with this PR because it's changing behavior and I wanted to first confirm thats OK:
model = AutoModelForCausalLM.from_config(config, torch_dtype=torch.float16, trust_remote_code=True)
Replacing from_pretrained
with from_config
like this makes it so you don't need to have the original model weights downloaded and stored - what do you think? This is an advantage on cloud platforms that charge for storage resources especially.
Hey @mike-ravkine ,
Thanks for the PR.
export TORCH_CUDA_ARCH_LIST='8.0 8.6 8.7 8.9 9.0'
What if newer architectures come up? We'll need to go back and edit this list. Also, seems like, you may have missed an architecture version 9.0a
.
The config for this model hasn't actually got a tokenizer defined, it just has the same name.
I have updated all of the AWQ models with tokenizers, so either of
tokenizer = AutoTokenizer.from_pretrained(model_name)
or
tokenizer = AutoTokenizer.from_pretrained(config.tokenizer)
should work. I'll gladly accept your change.
Replacing from_pretrained with from_config like this makes it so you don't need to have the original model weights downloaded and stored - what do you think? This is an advantage on cloud platforms that charge for storage resources especially.
Actually, it does load the model, only in the CPU. Except for using accelerate.init_empty_weights()
, there is no other way to init model with empty weights on a meta
device.
As for adding Docker image, I'd happy to include that line in the model card.
Thanks for your PR! It definitely helped me update all the AWQ models with respective tokenizers. And let me know what you think about TORCH_CUDA_ARCH_LIST
- how to keep it updated as future NVIDIA cards are released.
model_name
refers to the upstream model right? In this case it'stiiuae/falcon-7b
that did not have aconfig.tokenizer
Good point on the list of CUDA versions. The original problem I hit was that if you dont have any hardware at all installed during the compile it will decide to support all previous versions but the kernels here don't work below 8.0. If you don't set this variable torch tries to auto-detect based on your current hardware, so perhaps this is best added a note rather then actually modifying the build instructions.
To be clear I have no issue with
accelerate.init_empty_weights()
.. The question is do you want to require the weights of the original model (even if they're not used at all)? That's a side-effect ofAutoModelForCausalLM.from_pretrained
it will always download the weights. Swapping inAutoModelForCausalLM.from_config
makes it so you don't need to download original weights:
with init_empty_weights():
model = AutoModelForCausalLM.from_config(config, torch_dtype=torch.float16, trust_remote_code=True)
Thanks for clarifying init_empty_weights
issue! I was under the wrong impression that from_pretrained
does not load weights in init_empty_weights
context block.
I have updated all of the AWQ model documentions to reflect this learning.