[Error?] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

#7
by flexai - opened

When using an example from https://huggingface.co/distil-whisper/distil-large-v3#sequential-long-form, I receive Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. warning.

Is it expected or does it indicate an error in the set up on my end?

In addition to the loading example, I prepare the model locally during the docker image build with the following method:

def download_model():
    import os
    import transformers
    from huggingface_hub import snapshot_download

    # Ensure folder exists
    os.makedirs(MODEL_CACHE_DIR, exist_ok=True)
    snapshot_download(
        repo_id="distil-whisper/distil-large-v3",
        allow_patterns=["model.safetensors", "*.json", "*.txt"],
        local_dir=MODEL_CACHE_DIR,
    )
    transformers.utils.move_cache()

then when loading, instead of specifying a model string, I provide MODEL_CACHE_DIR instead.

Sign up or log in to comment