i have trained a Qwen 14b model on a smaller dataset, but its now very tricky because i have got nowhere to use it via inference (the paid for inference on hf costs quite a lot), does anyone know of anywhere where i can deploy my model and use it via api for a reasonable cost, or ideally none. thanks
Join the conversation
Join the community of Machine Learners and AI enthusiasts.
Sign Upi have trained a Qwen 14b model on a smaller dataset, but its now very tricky because i have got nowhere to use it via inference (the paid for inference on hf costs quite a lot), does anyone know of anywhere where i can deploy my model and use it via api for a reasonable cost, or ideally none. thanks
The 14B model might just barely work with 16GB of VRAM in the free version of Google Colab, assuming 4-bit quantization at runtime.
I'm not familiar with Colab itself, so ask someone else how to use it.
You might want to give Predibase a try.
Download it and run it with lm studio the. Use the open ai to access it
glhf.chat has an API for any LLMs on huggingface for free, although it has a really low rate limit of 480 requests/8 hours (anyway it's free).
Try google colab.
You can run it on the free tier.
Rent a VM in Runpod. I would recommend a 24gb VRAM and quantize the model to 8bit. You can use TGI and stop the VM when not in use. Or you can use serverless inference in runpod, it is also a great option for a small quantity of requests.