Link of model download

#25
by eashanchawla - opened

I am downloading the whisper-large model and caching it on the root level of my directory by doing:
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-large")
processor = WhisperProcessor.from_pretrained("openai/whisper-large")

My question is, how do I figure out what URL this downloads the model from? I see that there's the original openai model links in convert_openai_to_hf.py (transformers/models/whisper in the github repo for transformers). Is the model being downloaded from there or is it being downloaded from hugging face hub (because I also see a convert function to save a pytorch model object at a specific location)?

This is needed so I can add this URL to an accept list, allowing my flask app to access and download from it.

hi @eashanchawla , all the data for this model is on our hub. When you use from_pretrained("openai/whisper-large) the data is fetch the repo from https://huggingface.co/openai/whisper-large/tree/main for instance the pytorch model file is here https://huggingface.co/openai/whisper-large/blob/main/pytorch_model.bin and via resolve https://huggingface.co/openai/whisper-large/resolve/main/pytorch_model.bin

Thank you for the quick response @radames ! I do have a follow up question though:
I defined a proxy dict as follows:
PROXY_DICT = {
'http:': 'proxy.ec.com:8000',
'https:': 'proxy.ec.com:8000'
}
self.processor = WhisperProcessor.from_pretrained(
pretrained_model_name_or_path ="openai/whisper-large",
cache_dir=TEMP_DIR,
force_download=True,
revision='e5aba7b7d827c01bf4db9b90d9ea7d670295b212',
proxies=PROXY_DICT
)

I get the following error:
ValueError: We have no connection or you passed local_files_only, so force_download is not an accepted option.

Is there something I am missing out with how I am defining the proxy? Because when I try to run this command on the powershell: Invoke-restmethod -Uri "https://huggingface.co/openai/whisper-large/resolve/main/preprocessor_config.json" -Proxy "http://proxy.ec.com:8000", I see it pulling config info.

Is it not downloading but loading from the path somehow? What can I try additionally to make it work?

If I don't set force_download=True, I see this error

(OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like openai/whisper-large is not the path to a directory containing a file named preprocessor_config.json.
Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.)

Sign up or log in to comment