featherless-ai/try-this-model · jeiku/Aura-NeMo-12B

Aug 24, 2024

No description provided.

Featherless Serverless LLM org Aug 25, 2024

•

Thanks for visiting.

Could you tell me more of what's on your mind?

I can't tell what you're looking for with this thread.

Featherless Serverless LLM org Sep 4, 2024

Hey again @Jebadiah ,

If you're looking to see https://huggingface.co/jeiku/Aura-NeMo-12B available for inference here, unfortunately we aren't able to at this time.

Two blockers:

our inference stack operates on model cards that are full models. While specifying a LoRA is convenient (and space efficient) way to specify a model, the Featherless model execution pipeline can't use those (yet)
The base model is a Q4 quant. That's efficient for fine-tuning, but our inference stack runs all models at FP8 and we don't currently support lower quants.

If you find a card that overcomes these two limitations, please let us know!

wxgeorge changed discussion status to closed Sep 20, 2024