crm_llm_leaderboard / crm-results /hf_leaderboard_latency_cost.csv
yibum's picture
join cost table
84ee137
raw
history blame
3.45 kB
Model Name,Cost and Speed: Flavor,Version,Platform,Response Time (Sec),Mean Output Tokens,Mean Cost per 1K Requests,Cost Band,,Model id,Cost per 1m input tokens,Cost per 1m output tokens,,,,Percentile,From,To,,min,Max
AI21 Jamba-Instruct,Long,,AI21,4.0,232.9,1.6,Medium,,GPT 3.5 Turbo,0.5,1.5,,,0%,0.43,0.43,1.61,,0.43,61.11
AI21 Jamba-Instruct,Short,,AI21,4.0,243.9,0.5,Low,,GPT 4 Turbo,10,30,,,33%,1.61,1.61,9.28,,,
Claude 3 Haiku,Long,,Bedrock,2.8,236.9,1.0,Low,,GPT4-o,5,15,,,67%,9.28,9.28,61.11,,,
Claude 3 Haiku,Short,,Bedrock,2.2,245.4,0.4,Low,,Claude 3 Haiku,0.25,1.25,,,100%,61.11,,,,,
Claude 3 Opus,Long,,Bedrock,12.2,242.7,61.1,High,,Claude 3 Opus,15,75,,,,,,,,,
Claude 3 Opus,Short,,Bedrock,8.4,243.2,25.4,High,,AI21 Jamba-Instruct,0.5,0.7,,,,,,,,,
Cohere Command R+,Long,,Bedrock,7.7,245.7,11.7,High,,Cohere Command Text,1.5,2,,,,,,,,,
Cohere Command R+,Short,,Bedrock,7.1,249.9,5.1,Medium,,Cohere Command R+,3,15,,,,,,,,,
Cohere Command Text,Long,,Bedrock,12.9,238.7,4.3,Medium,,Gemini Pro 1,0.5,1.5,,,,,,,,,
Cohere Command Text,Short,,Bedrock,9.6,245.6,1.1,Low,,Gemini Pro 1.5,3.5,7,,,,,,,,,
Gemini Pro 1.5,Long,,Google,5.5,245.7,11.0,High,,,,,,,,,,,,,
Gemini Pro 1.5,Short,,Google,5.4,247.5,3.3,Medium,,,,,,,,,,,,,
Gemini Pro 1,Long,,Google,6.0,228.9,1.7,Medium,,,,,,,,,,,,,
Gemini Pro 1,Short,,Google,4.4,247.4,0.6,Low,,,,,,,,,,,,,
GPT 3.5 Turbo,Long,,OpenAI,4.5,249.9,1.6,Low,,,,,,,,,,,,,
GPT 3.5 Turbo,Short,,OpenAI,4.2,238.3,0.6,Low,,,,,,,,,,,,,
GPT 4 Turbo,Long,,OpenAI,12.3,247.6,32.0,High,,,,,,,,,,,,,
GPT 4 Turbo,Short,,OpenAI,12.3,250.0,11.7,High,,,,,,,,,,,,,
GPT4-o,Long,,OpenAI,5.1,248.4,15.9,High,,,,,,,,,,,,,
GPT4-o,Short,,OpenAI,5.0,250.0,5.8,Medium,,,,,,,,,,,,,
Mistral 7B,Long,Mistral-7B-Instruct-v0.2,Self-host (g5.48xlarge),8.83,242.0,16.5,High,,,,,,,,,,,,,
Mistral 7B,Short,Mistral-7B-Instruct-v0.2,Self-host (g5.48xlarge),8.31,247.0,15.5,High,,,,,,,,,,,,,
LLaMA 3 8B,Long,Meta-Llama-3-8B-Instruct,Self-host (g5.48xlarge),3.76,251.5,7.0,Medium,,,,,,,,,,,,,
LLaMA 3 8B,Short,Meta-Llama-3-8B-Instruct,Self-host (g5.48xlarge),3.23,243.6,6.0,Medium,,,,,,,,,,,,,
LLaMA 3 70B,Long,llama-3-70b-instruct,Self-host (p4d.24xlarge),20.1,243.9,67.7,High,,,,,,,,,,,,,
LLaMA 3 70B,Short,llama-3-70b-instruct,Self-host (p4d.24xlarge),29.4,251.2,99.0,High,,,,,,,,,,,,,
Mixtral 8x7B,Long,mixtral-8x7b-instruct,Self-host (p4d.24xlarge),2.44,248.5,8.22,Medium,,,,,,,,,,,,,
Mixtral 8x7B,Short,mixtral-8x7b-instruct,Self-host (p4d.24xlarge),2.41,250.0,8.11,Medium,,,,,,,,,,,,,
SF-TextBase 7B,Long,CRM-TextBase-7b-22k-g5 (endpoint),Self-host (g5.48xlarge),8.99,248.5,16.80,High,,,,,,,,,,,,,
SF-TextBase 7B,Short,CRM-TextBase-7b-22k-g5 (endpoint),Self-host (g5.48xlarge),8.29,248.7,15.50,High,,,,,,,,,,,,,
SF-TextBase 70B,Long,TextBase-70B-8K,Self-host (p4de.24xlarge),6.52,253.7,28.17,High,,,,,,,,,,,,,
SF-TextBase 70B,Short,TextBase-70B-8K,Self-host (p4de.24xlarge),6.24,249.7,26.96,High,,,,,,,,,,,,,
SF-TextSum,Long,CRM-TSUM-7b-22k-g5 (endpoint),Self-host (g5.48xlarge),8.85,244.0,16.55,High,,,,,,,,,,,,,
SF-TextSum,Short,CRM-TSUM-7b-22k-g5 (endpoint),Self-host (g5.48xlarge),8.34,250.4,15.60,High,,,,,,,,,,,,,
XGen 2,Long,EinsteinXgen2E4DSStreaming (endpoint),Self-host (p4de.24xlarge),3.71,250.0,16.03,High,not able to get response for large token requests (5K-token input),,,,,,,,,,,,
XGen 2,Short,EinsteinXgen2E4DSStreaming (endpoint),Self-host (p4de.24xlarge),2.64,250.0,11.40,High,,,,,,,,,,,,,