Error with Tokenizer

#121

by wissamee - opened Jun 26, 2024

Jun 26, 2024

Hello,
I'm currently fine-tuning the "Mistral-7B-Instruct-v0.1" model and I've encountered an issue that I haven't faced before when using the AutoTokenizer from Transformers. Here's the code I'm using:

tokenizer = AutoTokenizer.from_pretrained( base_model_id, padding_side="left", # reduces memory usage add_eos_token=True, add_bos_token=True, ) tokenizer.pad_token = tokenizer.eos_token

However, I'm receiving the following error:

OSError: Can't load tokenizer for 'mistralai/Mistral-7B-Instruct-v0.1'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'mistralai/Mistral-7B-Instruct-v0.1' is the correct path to a directory containing all relevant files for a LlamaTokenizerFast tokenizer.

Does anyone know how to resolve this issue?

OmPatel

Jul 3, 2024

•

edited Jul 3, 2024

I am facing the same issue, Have you found any solution?

wissamee

Jul 3, 2024

I'm not sure if it's relevant, but I'm temporarily utilizing the "Mistral-7B-v0.1" tokenizer until a solution is found. Please keep me informed if there are any updates.

amcgrigor

Jul 10, 2024

Hi - I have the same error but using flash_attn==2.5.8 gets rid of the tokenizer error but creates a new import module error for downloading models.

Requirements to reproduce:
flash_attn==2.5.8
transformers==4.41.2
torch==2.2.2
requests==2.31.0
mlflow==2.13.1
bitsandbytes==0.42.0
accelerate==0.31.0

databricks 14.3 ML cluster with cuda version 11.8
Has anyone got a fix?

murilobk6

Jul 24, 2024

This isn't a library error. I was facing the same issue until I realized I hadn't logged in to Hugging Face:

from huggingface_hub import login
login(token="your_access_token_here")

rakeshrpm565

Sep 2, 2024

I'm trying to deploy the model on AKS cluster by adding the env variable 'HF_TOKEN' to the mistral-7b.yaml but still getting an error '401 Client Error: Unauthorized for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1/resolve/main/adapter_config.json'. Any advise on this? Thanks

lineality

Dec 8, 2024

Good note by rakeshrpm565!!

The error is (in some cases) misleading, and adding the huggingface auth token to BOTH the tokenizer and the model fixed this for me:
https://github.com/lineality/huggingface_access_token_cheatsheet

So the real issue may be...misleading/incorrect error messages.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment