It now supports fine-tuning, and the inference cost is the same as the base model! <coughs LORA adopters> π€π€
So the base model must be expensive? πΈ For the base model, the input price is reduced by 78% to $0.075/1 million tokens and the output price by 71% to $0.3/1 million tokens. ππ΅
But is it any good? π€·ββοΈ On the LLM Hallucination Index, Gemini 1.5 Flash achieved great context adherence scores of 0.94, 1, and 0.92 across short, medium, and long contexts. ππ―
Google has finally given a model that is free to tune and offers an excellent balance between performance and cost. βοΈπ
It feels awkward having my first post sharing my stuff, but this is a weekend project that I really enjoyed working on. I'd love to meet more people interested in random ideas like this.
A hard part of building AI applications is choosing which model to use. What if we donβt have to? What if we can predict the best model for any prompt?
Predictive human preference aims to predict which model users might prefer for a specific query.
One use case is model routing. If we know in advance that for a prompt, users will prefer Claude Instantβs response over GPT-4, and Claude Instant is cheaper/faster than GPT-4, we can route this prompt to Claude Instant. Model routing has the potential to increase response quality while reducing costs and latency.
One pattern is that for simple prompts, weak models can do (nearly) as well as strong models. For more challenging prompts, however, users are more likely to prefer stronger models. Hereβs a visualization of predicted human preference for an easy prompt (βhello, how are you?β) and a challenging prompt (βExplain why Planc length β¦β).
Preference predictors make it possible to create leaderboards unique to any prompt and domain.