@automatedstockminingorg on Hugging Face: "hi everyone, i have trained a Qwen 14b model on a smaller dataset, but its now…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

automatedstockminingorg

posted an update Nov 3, 2024

Post

2405

hi everyone,
i have trained a Qwen 14b model on a smaller dataset, but its now very tricky because i have got nowhere to use it via inference (the paid for inference on hf costs quite a lot), does anyone know of anywhere where i can deploy my model and use it via api for a reasonable cost, or ideally none. thanks

John6666

Nov 3, 2024

The 14B model might just barely work with 16GB of VRAM in the free version of Google Colab, assuming 4-bit quantization at runtime.
I'm not familiar with Colab itself, so ask someone else how to use it.

skerit

Nov 3, 2024

You might want to give Predibase a try.

LeroyDyer

Nov 4, 2024

Download it and run it with lm studio the. Use the open ai to access it

hakutaku

Nov 4, 2024

glhf.chat has an API for any LLMs on huggingface for free, although it has a really low rate limit of 480 requests/8 hours (anyway it's free).

foscraft

Nov 4, 2024

Try google colab.
You can run it on the free tier.

joaomsimoes

Nov 4, 2024

Rent a VM in Runpod. I would recommend a 24gb VRAM and quantize the model to 8bit. You can use TGI and stop the VM when not in use. Or you can use serverless inference in runpod, it is also a great option for a small quantity of requests.

In this post