'adapter_config.json' at 'meta-llama/Meta-Llama-3-8B-Instruct'

#1
by wvangils - opened

Hi, thank you for this resource, really appreciate it! I'm trying to use your PEFT adapter using Llama3-instruct and get an error:

ValueError: Can't find 'adapter_config.json' at 'meta-llama/Meta-Llama-3-8B-Instruct'

Now from what it seems this is a versioning issues of transformers and peft, i.e. https://medium.com/@Thimira/how-to-fix-the-cant-find-adapter-config-json-error-with-hugging-face-2e0a16643f74

I have upgrade transformers to 4.40.0 and peft is on 0.10.0. Any advice on how to proceed and use your adapter model?

Hi, I currently am not able to check the versions but I used the latest version of each if I am not mistaken. I forgot to update requirements.txt on our repo though (https://github.com/UnderstandLingBV/LLaMa2lang) but maybe give those versions a try or fiddle around a bit with latest for all but PEFT (as I am not sure I also updated that one).

Let me know if any combination works, otherwise I can check (quite a bit) later

Wait now that I read your message better: mind you we used Meta's own llama-8B which is behind a consent wall. You need to accept their terms and get access and once you have that, before running whatever code you run, set your HF_TOKEN environment variable to one associated with an account that has access. I will update our repo readme on Github

Request access here: https://huggingface.co/meta-llama/Meta-Llama-3-8B

Yes, thank you, I know the resource is gated. I have access to the llama3 models using HF and supplied my token for downloading from the hub. I'm now using different PEFT versions to see if this could be the issue.

I got it working now with transformers 4.36.2 and 4.40.0 using peft version 0.10.0. Code below:

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, GenerationConfig
from peft import PeftModel, PeftConfig
import torch

Use llama3 for inference as base model

model_base = 'meta-llama/Meta-Llama-3-8B-Instruct'
model_adapter = 'UnderstandLing/Llama-3-8B-Instruct-nl'

Make sure we load model in 4-bit

device = 'cuda'
quantization_config = BitsAndBytesConfig(load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16)

config = PeftConfig.from_pretrained(model_adapter)
model = AutoModelForCausalLM.from_pretrained(model_base, quantization_config=quantization_config, )
model = PeftModel.from_pretrained(model, model_adapter)
tokenizer = AutoTokenizer.from_pretrained(model_base)

First impressions of using the adapter for Dutch is that the model behaves more like a chat-model instead of an instruct-model. I see longer generations, and the assignments are not strictly executed.

That is probably because it is finetuned on oasst1 which is chat heavy...

Sign up or log in to comment