jeiku/Aura-NeMo-12B

#2
by Jebadiah - opened
No description provided.
Featherless Serverless LLM org
edited 24 days ago

Hi @Jebadiah ,

Thanks for visiting.

Could you tell me more of what's on your mind?

I can't tell what you're looking for with this thread.

Featherless Serverless LLM org

Hey again @Jebadiah ,

If you're looking to see https://huggingface.co/jeiku/Aura-NeMo-12B available for inference here, unfortunately we aren't able to at this time.

Two blockers:

  1. our inference stack operates on model cards that are full models. While specifying a LoRA is convenient (and space efficient) way to specify a model, the Featherless model execution pipeline can't use those (yet)
  2. The base model is a Q4 quant. That's efficient for fine-tuning, but our inference stack runs all models at FP8 and we don't currently support lower quants.

If you find a card that overcomes these two limitations, please let us know!

Sign up or log in to comment