zero-gpu-explorers/README · use authentication in huggingface Gradio API!!!(hosting on ZeroGPU)

Nov 4, 2024

Guys.

I have already hosted my code on ZeroGPU(for that i subscribe the PRO)

When I visited him on the webpage (logged in as my PRO user), I did receive 5x usage quota compared to free users.

But when I use it in Python code, I use the gradio_client, and I can indeed post requests to the Gradio API that I host on HF Space using ZeroGPU.I found that my quota is when I am not logged in.

By the way, why i know the quota is when i am not logged in?

I do some test, finally i get some information:

NOT LOGIN: the quota is about 180s
LOGIN: the quota is 300s
PRO USER: the quota is 1500s.....

So i just want find some way to solve this problem, i want use my PRO user in my code!!!

I have tried carrying HF tokens or headers (including cookies), but they have not worked and I am still logged in.

The error just like:
gradio_client.exceptions.AppError: The upstream Gradio app has raised an exception: You have exceeded your GPU quota (150s requested vs. 149s left). Create a free account to get more usage quota.

John6666

Nov 4, 2024

It's not the crux of this issue, but if the durarion designation exceeds 120 seconds, it's basically buggy.

Nerva1228

Nov 18, 2024

It's not the crux of this issue, but if the durarion designation exceeds 120 seconds, it's basically buggy.

NO, I just do a test to find the quota.
Actually, the duration maybe 5~10s

John6666

Nov 18, 2024

Actually, the duration maybe 5~10s

Oh. If so, it's definitely a bug. 😅
I'm also using Zero GPU space, and sometimes the login quota doesn't work properly.
However, in my case, for some reason it's a different pattern from yours, and the login status doesn't work properly in the web browser. If I sign in explicitly using the OAuth sign-in button on the space, it works.
I've never used the Gradio Client...

Moibe

Nov 23, 2024

I have exactly the same need as Nerva1228, and thanks John666 but it seems that, sound like a meme but, it's not a bug it's a feature. Not being able to use my PRO user quota programatically is not good, I want to test my own spaces with different combinations from within my code and I'm as limited as a normal user, is this intended to be that way or there is a solution, or maybe a feature for the future.

frostbyte07

Nov 26, 2024

I got it to work!!!!

I duplicated the black-forest-labs/FLUX.1-dev (!!!Important!!!! Go to their Model Card Page and get access granted to their gated model first. Then you can duplicate the model)
Set the environment variables:
HF_TOKEN =Use your HF_token (I used read and write credentials for this)
ZEROGPU_V2=true
ZERO_GPU_PATCH_TORCH_DEVICE=1

Then in your own (duplicated) space:

Navigate to the Files
Click on app.py
Change

@spaces.GPU(duration=75)  # The duration max value can be 120 but this wasn't enough and still didn't work for me

to

@spaces.GPU()

Make sure in your python code that uses the gradio_client python library your HF_TOKEN is set in the environment or you set the parameter hf_token when creating the client
example:

from gradio_client import Client 
 client = CLIENT("your_duplicated_space/FLUX.1-dev", hf_token=os.getenv("HF_TOKEN"))

Moibe

Nov 26, 2024

I got it to work!!!!

I duplicated the black-forest-labs/FLUX.1-dev (!!!Important!!!! Go to their Model Card Page and get access granted to their gated model first. Then you can duplicate the model)

Set the environment variables:
HF_TOKEN =Use your HF_token (I used read and write credentials for this)
ZEROGPU_V2=true
ZERO_GPU_PATCH_TORCH_DEVICE=1

Then in your own (duplicated) space:

Navigate to the Files

Click on app.py

Change
@spaces.GPU(duration=75)  # The duration max value can be 120 but this wasn't enough and still didn't work for me
to
@spaces.GPU()
Make sure in your python code that uses the gradio_client python library your HF_TOKEN is set in the environment or you set the parameter hf_token when creating the client
example:
from gradio_client import Client 
 client = CLIENT("your_duplicated_space/FLUX.1-dev", hf_token=os.getenv("HF_TOKEN"))

And what about the usage of the quota, you are able to use your PRO User quota??

frostbyte07

Nov 26, 2024

I got it to work!!!!

I duplicated the black-forest-labs/FLUX.1-dev (!!!Important!!!! Go to their Model Card Page and get access granted to their gated model first. Then you can duplicate the model)

Set the environment variables:
HF_TOKEN =Use your HF_token (I used read and write credentials for this)
ZEROGPU_V2=true
ZERO_GPU_PATCH_TORCH_DEVICE=1

Then in your own (duplicated) space:

Navigate to the Files

Click on app.py

Change
@spaces.GPU(duration=75)  # The duration max value can be 120 but this wasn't enough and still didn't work for me
to
@spaces.GPU()
Make sure in your python code that uses the gradio_client python library your HF_TOKEN is set in the environment or you set the parameter hf_token when creating the client
example:
from gradio_client import Client 
 client = CLIENT("your_duplicated_space/FLUX.1-dev", hf_token=os.getenv("HF_TOKEN"))
And what about the usage of the quota, you are able to use your PRO User quota??

Yes. Try it out and see if it works for you.

Moibe

Dec 12, 2024

I tried it and it doen't work, one thing is ti reduce the duration like this: @spaces.GPU(duration=75), it's ok.

But that doesn't change the fact that we are consuming quota as a normal user and not as a PRO User when we use it programatically.

The problem stated by Nerva1228 and me, it's still there, we want to be able to use our 25 minutes quota via our python code, not only in the hf spaces interface.

John6666

Dec 12, 2024

orsk-moscow

Dec 14, 2024

I am stacked with the same problem.
Is any way to upvote this issue?

John6666

Dec 14, 2024

@hysts Is this a problem specific to Zero GPU space?
Is it better to raise the issue with HF or Gradio?

hysts

ZeroGPU Explorers org Dec 14, 2024

@John6666 There’s no need to create an issue for this. My understanding is that our infra team is aware that ZeroGPU quota does not work with API use case and that there's a request for it, but it simply hasn’t been implemented yet. cc @cbensimon

John6666

Dec 14, 2024

Thank you! I understand. I will avoid the issue so as not to cause any unnecessary trouble.😅

asr143r

Dec 19, 2024

•

edited Dec 19, 2024

@John6666 There’s no need to create an issue for this. My understanding is that our infra team is aware that ZeroGPU quota does not work with API use case and that there's a request for it, but it simply hasn’t been implemented yet. cc @cbensimon

If it is indeed a bug recognized internally within HF, will Hugging face compensate the PRO subscribers for not being able to use quota for which they have paid for? @hysts

hysts

ZeroGPU Explorers org Dec 20, 2024

@asr143r I'm not in a position to answer refund-related questions, so I've forwarded your inquiry internally.

Moibe

Dec 20, 2024

•

edited Dec 20, 2024

That’s great, the important thing is HF notice and tell us if it’s an error or is the way is has to be.

cristianduguet

Dec 22, 2024

Is there a public ticket link we can look at so we can be aware of any changes? I have asked for a refund a couple of hours after I got PRO because of this, and I am waiting for them to fix it. In the meantime, open to other options.

LilithX6X

Dec 26, 2024

•

edited Dec 26, 2024

Ditto!

MegaTronX

Dec 26, 2024

•

edited Dec 26, 2024

@hysts It's not just the api. I get this as a PRO user on my own Zero GPU space (https://huggingface.co/spaces/MegaTronX/Llama-3_1-8B-Abliterated) all the time. My subscription fee is basically a donation ;) to Hugging Face at this point since Zero GPU spaces are also the reason I originally subscribed.

If the infrastructure team is too busy at the moment to tackle this I'd be happy to help. I've been doing DevOps since before they had a name for it and I've built cloud infrastructure for enterprise clients from Intuit (only first year 3rd party vendor in their company history that didn't crash during war room week, aka tax week) to Walmart (oversaw the infrastructure for the inception of WalmartOne, their internal social media site that employees view their paychecks on). There are also a number of AI cloud infrastructure and SaaS companies (e.g. Runpod, Fireworks AI, etc.), that are trying, and failing miserably in many cases, to interface with Hugging Face Repos, including for their own Serverless (and Dedicated in some cases) cloud GPU offerings. I'm sure I could help with that as well, assuming Hugging Face isn't purposefully eschewing those companies for any reason, and is willing to negotiate a beneficial and compensatory solution with them. I'm more than confident that they'd jump at the chance to throw money at Hugging Face's Hugging Face to get their already being sold, never mind advertised, services that don't work, working. I don't need to consult with them to know that they're extremely desperate to get that stuff working.

hysts

ZeroGPU Explorers org Dec 26, 2024

Hi @MegaTronX It seems that the SDK version of your Space is gradio==4.31.3 https://huggingface.co/spaces/MegaTronX/Llama-3_1-8B-Abliterated/blob/2af4a53bd398c40b66faaed62b251e87bf953ab7/README.md?code=true#L7, but can you try upgrading it to the latest gradio==5.9.1? IIRC, there was a bug in old gradio where the ZeroGPU quota was not properly handled when using gr.ChatInterface, but it's fixed in the latest gradio.

MegaTronX

Dec 27, 2024

@hysts Yup, you nailed it, upgrading to the latest gradio version got rid of the GPU quota errors I was getting on the site. Thanks for solving that mystery for me.

For anyone else that is getting this error, in this case when using their space on the Hugging Face site, there is a potential caveat to be aware of when upgrading gradio that tripped me up initially. Sometimes when you change a number, you don't actually change the number, i.e. check the logs. After updating the gradio version in the requirements file, I was still getting the error. When I looked under the container logs section I saw "IMPORTANT: You are using gradio version 4.31.3, however version 4.44.1 is available, please upgrade." Digging through the builds log it showed that 5.9.1 was being installed initially, but later in the logs it was being uninstalled, followed by 4.31.3 being reinstalled. I didn't see any entries indicative of why, but I assume it's probably just too big of a leap between the two versions, especially where 5.0 was a major release with fundamental changes to the code base. Rather than trying my luck with incremental upgrades to see if I could get there (which if I had 5 likes on the space I might have done), I just created a new space and after working through the deprecated code etc., I then made my chatbot talk to me for an excessive amount of time with no quota errors in site.

HassanTrabelsi

Jan 6

•

edited Jan 6

on my side even with gradio=5.9.1 and new space I still see the same error with my pro account HF_TOKEN

deeduckme

Jan 7

what is the solution ?

deeduckme

Jan 7

@fbi hello hello la police ?

HassanTrabelsi

Jan 7

•

edited Jan 7

I even upgrade to gradio 5.10.0 which was released today still it's not working seems like we still have to wait more

Moibe

about 1 month ago

The problem is not about the gradio version. It's about the service not being delivered correctly by Hugging Face.

deeduckme

about 1 month ago

yes ! please hugging face do something !

HassanTrabelsi

about 1 month ago

Indeed I can see that the root of the problem lies here:

from gradio_client import Client
client = Client(self.hf_space, hf_token=self.hf_token)

This is the part where I communicate with the modal so I suspect the issue originates from the gradio_client.

MegaTronX

about 1 month ago

on my side even with gradio=5.9.1 and new space I still see the same error with my pro account HF_TOKEN

For clarification, upgrading to the latest gradio version in my case was specific to fixing the error on the Hugging Face Spaces site itself. Based on previous comments, a fix for the error when making api calls has yet to be released.

deeduckme

about 1 month ago

do you know when the fix will be released ?

Moibe

about 1 month ago

@HassanTrabelsi I don't think that the root of the problem relies here:

from gradio_client import Client
client = Client(self.hf_space, hf_token=self.hf_token)

that simplly creates the client with the respective auth to use it. That works, we all have access to the space used.

The problem relies on the amount of time we can use it. When used inside the HF platform it measures based in our user, so it detects if you are a PRO User or not. But THE PROBLEM is when we use it programatically for example in a python app, because there is no way to show we are a Pro User, so it give the usual 300 seconds which are based solely in the IP.

HF must allow us to use our PRO time in this way too! If not it is useless.

orsk-moscow

about 1 month ago

@hysts , I noticed some users claimed for refund of a monthly payment, does HF have standard procedure for that? If not, how to request such refund?

hysts

ZeroGPU Explorers org about 1 month ago

@hysts , I noticed some users claimed for refund of a monthly payment, does HF have standard procedure for that? If not, how to request such refund?

I don't know the answer either, so let me cc @meganariley .

asr143r

about 1 month ago

@hysts , I noticed some users claimed for refund of a monthly payment, does HF have standard procedure for that? If not, how to request such refund?

I claimed my refund by simply emailing to support@huggingface.co and sighting this issue.
Refund was provided within 24 hours with no questions asked.

deeduckme

30 days ago

when this will work ?

hysts

ZeroGPU Explorers org 28 days ago

My understanding is that the implementation has been completed on both the Hub and gradio sides, and if you use gradio==5.12.0 and pass your HF token to the client, your logged-in quota should be used.

Moibe

28 days ago

@hysts thanks a lot for creating the video, whoa I was so convinced it didn't worked that I just sticked to the idea that HF_Token got nothing to do with the HF user authentication and never tested again. Your video convinced me to test it again visually as you did and it worked!! Sorry also to @HassanTrabelsi you were right.

Since which version it started to work? Some spaces I have with 4.39.0 don't work and some others with 5.9.1 work perfectly. I would like to know since which exact version it was fixed?

hysts

ZeroGPU Explorers org 28 days ago

My understanding is that the implementation on the gradio side was done in this PR. It's included in the latest gradio==5.12.0.

m1k3wn

22 days ago

So grateful to find this thread...Thanks everyone for the advice! I’ve just finished fine-tuning my first ever models: nidra-v1 and nidra-v2 from flan-T5-base. They're both trained to interpret dreams. I’ve (eventually) managed to make and deploy a space at huggingface.co/spaces/m1k3wn/nidra and make requests to it from axios. But the requests take like 90 seconds to return as I’m on the Cpu/16gb free plan; If I upgrade to pro will it allow me to make faster API requests? Or will I be faced with this recurrent issue mentioned above. thanks so much, really new to this side of programming.

Moibe

21 days ago

@hysts you are right, thanks for the detailed information.

Moibe

17 days ago

@m1k3wn You can upgrade your hardware without being PRO User but under a cost. Getting a PRO account will let you also use ZeroGPU (Nvidia A100) which eventually will be lot faster too and without cost (only the $9 Pro price monthly).