Model Human-like Cost Consistency Score 1 Score 2 Score 3 Total Company
human 1.00 0.94 10.0 10.0 10.0 human
claude-3-opus 0.91 6.21¢ 0.99 10.0 10.0 10.0 10.0 anthropic
claude-3-opus:beta 0.91 6.87¢ 0.99 8.0 10.0 10.0 9.3 anthropic
llama-3-8b-instruct:free 0.93 0.00¢ 0.97 10.0 7.0 10.0 9.0 meta-llama
gpt-4-vision-preview 0.93 1.65¢ 0.97 9.0 9.0 9.0 9.0 openai
gpt-4-turbo-preview 0.85 2.17¢ 0.96 9.0 9.0 9.0 9.0 openai
gpt-4-turbo 0.87 2.25¢ 0.96 9.0 9.0 9.0 9.0 openai
gpt-4-1106-preview 0.91 2.04¢ 0.97 7.0 10.0 9.0 8.7 openai
gpt-4-0314 0.91 3.41¢ 0.96 10.0 8.0 7.5 8.5 openai
claude-instant-1.1 0.90 0.10¢ 0.97 9.0 8.0 8.0 8.3 anthropic
phind-codellama-34b 0.89 0.06¢ 0.95 8.0 9.0 7.0 8.0 phind
mistral-large 0.90 1.51¢ 0.95 8.0 8.0 8.0 8.0 mistralai
gpt-4-32k-0314 0.89 6.07¢ 0.96 8.0 8.0 8.0 8.0 openai
wizardlm-2-7b 0.85 0.01¢ 0.96 6.5 9.0 8.0 7.8 microsoft
llama-3-70b-instruct:nitro 0.90 0.06¢ 0.95 8.0 6.0 9.0 7.7 meta-llama
wizardlm-2-8x22b 0.84 0.07¢ 0.98 7.0 9.0 7.0 7.7 microsoft
claude-instant-1.2 0.94 0.09¢ 0.98 6.0 8.0 9.0 7.7 anthropic
claude-3-haiku:beta 0.91 0.10¢ 0.97 8.0 9.0 6.0 7.7 anthropic
gemini-pro-1.5 0.89 2.21¢ 0.98 9.0 6.0 8.0 7.7 google
gpt-4-32k 0.92 7.18¢ 0.95 7.0 7.0 9.0 7.7 openai
claude-3.5-sonnet:beta 0.91 1.04¢ 0.98 7.5 7.0 8.0 7.5 anthropic
deepseek-coder 0.85 0.02¢ 0.97 8.0 8.0 6.0 7.3 deepseek
nous-hermes-2-mixtral-8x7b-dpo 0.87 0.04¢ 0.97 8.0 8.0 6.0 7.3 nousresearch
llama-3-70b-instruct 0.87 0.05¢ 0.98 6.0 7.0 9.0 7.3 meta-llama
sonar-medium-chat 0.88 0.05¢ 0.96 7.0 7.0 8.0 7.3 perplexity
jamba-instruct 0.87 0.08¢ 0.96 8.0 8.0 6.0 7.3 ai21
claude-instant-1:beta 0.93 0.08¢ 0.98 8.0 5.0 9.0 7.3 anthropic
claude-3-haiku 0.91 0.09¢ 0.97 8.0 6.0 8.0 7.3 anthropic
gpt-3.5-turbo-0301 0.92 0.10¢ 0.93 5.0 9.0 8.0 7.3 openai
snowflake-arctic-instruct 0.90 0.16¢ 0.96 8.0 8.0 6.0 7.3 snowflake
gpt-4o-2024-05-13 0.85 1.01¢ 0.97 9.0 7.0 6.0 7.3 openai
claude-3.5-sonnet 0.91 1.09¢ 0.98 6.0 8.0 8.0 7.3 anthropic
gpt-4 0.89 3.25¢ 0.95 8.0 6.0 8.0 7.3 openai
qwen-72b-chat 0.89 0.06¢ 0.96 6.0 7.5 8.0 7.2 qwen
claude-3-sonnet:beta 0.93 1.11¢ 0.96 6.5 8.0 7.0 7.2 anthropic
openchat-7b:free 0.90 0.00¢ 0.94 7.0 5.0 9.0 7.0 openchat
hermes-2-pro-llama-3-8b 0.90 0.01¢ 0.93 5.0 9.0 7.0 7.0 nousresearch
qwen-32b-chat 0.88 0.05¢ 0.95 6.0 5.0 10.0 7.0 qwen
codellama-70b-instruct 0.90 0.06¢ 0.91 5.0 9.0 7.0 7.0 meta-llama
gpt-3.5-turbo-0613 0.91 0.11¢ 0.96 8.0 7.0 6.0 7.0 openai
palm-2-codechat-bison 0.90 0.12¢ 0.97 8.0 5.0 8.0 7.0 google
claude-2.1 0.91 0.95¢ 0.98 9.0 7.0 5.0 7.0 anthropic
llama-3-8b-instruct:nitro 0.89 0.01¢ 0.93 8.0 5.5 7.0 6.8 meta-llama
llama-3-lumimaid-70b 0.89 0.48¢ 0.95 5.5 8.0 7.0 6.8 neversleep
dbrx-instruct:nitro 0.77 0.04¢ 0.74 6.0 7.5 6.8 databricks
openchat-7b 0.90 0.01¢ 0.95 7.0 9.0 4.0 6.7 openchat
llama-3-sonar-small-32k-online 0.89 0.01¢ 0.95 8.0 6.0 6.0 6.7 perplexity
lzlv-70b-fp16-hf 0.89 0.08¢ 0.95 5.0 7.0 8.0 6.7 lizpreciatior
gpt-3.5-turbo 0.87 0.08¢ 0.97 4.0 8.0 8.0 6.7 openai
claude-instant-1 0.92 0.09¢ 0.96 6.0 6.0 8.0 6.7 anthropic
llama-3-lumimaid-8b 0.88 0.14¢ 0.95 5.0 7.0 8.0 6.7 neversleep
claude-2 0.93 0.88¢ 0.98 8.0 5.0 7.0 6.7 anthropic
claude-2:beta 0.93 0.96¢ 0.98 5.0 6.0 9.0 6.7 anthropic
claude-3-sonnet 0.92 1.06¢ 0.96 5.0 8.0 7.0 6.7 anthropic
nous-hermes-2-mistral-7b-dpo 0.89 0.02¢ 0.93 8.0 5.5 6.0 6.5 nousresearch
llama-3-8b-instruct 0.91 0.00¢ 0.94 10.0 6.0 3.0 6.3 meta-llama
openchat-8b 0.89 0.00¢ 0.95 6.0 7.0 6.0 6.3 openchat
phi-3-medium-4k-instruct 0.89 0.01¢ 0.92 9.0 3.0 7.0 6.3 microsoft
mixtral-8x22b-instruct 0.86 0.05¢ 0.97 7.0 6.0 6.0 6.3 mistralai
gpt-3.5-turbo-0125 0.88 0.07¢ 0.97 6.0 7.0 6.0 6.3 openai
gemini-flash-1.5 0.85 0.07¢ 0.97 6.0 7.0 6.0 6.3 google
command-r-plus 0.92 0.97¢ 0.96 7.0 6.0 6.0 6.3 cohere
llama-3-sonar-large-32k-chat 0.88 0.06¢ 0.98 5.0 7.5 6.0 6.2 perplexity
qwen-14b-chat 0.91 0.02¢ 0.97 6.0 4.0 8.0 6.0 qwen
dbrx-instruct 0.43 0.02¢ 0.75 6.0 6.0 databricks
deepseek-chat 0.89 0.02¢ 0.97 6.0 7.0 5.0 6.0 deepseek
zephyr-orpo-141b-a35b 0.87 0.06¢ 0.95 7.0 6.0 5.0 6.0 huggingfaceh4
gpt-3.5-turbo-16k 0.90 0.20¢ 0.97 4.0 6.0 8.0 6.0 openai
nemotron-4-340b-instruct 0.87 0.28¢ 0.98 6.0 5.0 7.0 6.0 nvidia
mistral-medium 0.89 0.52¢ 0.95 6.0 5.0 7.0 6.0 mistralai
midnight-rose-70b 0.87 0.85¢ 0.96 4.0 6.0 8.0 6.0 sophosympatheia
gpt-4o 0.88 1.05¢ 0.96 6.0 4.0 8.0 6.0 openai
llama-3-sonar-small-32k-chat 0.93 0.01¢ 0.95 7.0 4.0 6.0 5.7 perplexity
mixtral-8x7b-instruct 0.91 0.02¢ 0.96 5.0 6.0 6.0 5.7 mistralai
gemini-pro-vision 0.82 0.04¢ 0.94 5.0 6.0 6.0 5.7 google
llama-3-sonar-large-32k-online 0.90 0.06¢ 0.94 6.0 5.0 6.0 5.7 perplexity
gpt-3.5-turbo-1106 0.88 0.08¢ 0.96 5.0 5.0 7.0 5.7 openai
pplx-70b-chat 0.84 0.09¢ 0.94 4.0 9.0 4.0 5.7 perplexity
llama-3-8b-instruct:extended 0.92 0.11¢ 0.96 7.0 5.0 5.0 5.7 meta-llama
pplx-70b-online 0.91 0.59¢ 0.94 6.0 7.0 4.0 5.7 perplexity
claude-2.0:beta 0.90 1.00¢ 0.98 6.0 6.0 5.0 5.7 anthropic
mistral-tiny 0.91 0.02¢ 0.96 5.0 7.5 4.0 5.5 mistralai
phi-3-medium-128k-instruct:free 0.89 0.00¢ 0.91 7.0 5.0 4.0 5.3 microsoft
mistral-7b-instruct:nitro 0.90 0.02¢ 0.95 6.0 6.0 4.0 5.3 mistralai
palm-2-chat-bison-32k 0.88 0.06¢ 0.94 6.0 4.0 6.0 5.3 google
qwen-110b-chat 0.88 0.10¢ 0.96 5.0 5.0 6.0 5.3 qwen
llama-3-lumimaid-8b:extended 0.91 0.12¢ 0.95 5.0 5.0 6.0 5.3 neversleep
claude-1.2 0.92 0.77¢ 0.97 5.0 5.0 6.0 5.3 anthropic
claude-2.0 0.88 0.92¢ 0.98 6.0 6.0 4.0 5.3 anthropic
claude-1 0.90 1.05¢ 0.97 5.0 6.0 5.0 5.3 anthropic
mistral-7b-instruct-v0.3 0.88 0.01¢ 0.96 5.0 6.0 4.0 5.0 mistralai
toppy-m-7b 0.89 0.01¢ 0.93 7.0 3.0 5.0 5.0 undi95
mixtral-8x7b-instruct:nitro 0.85 0.05¢ 0.96 4.0 5.0 6.0 5.0 mistralai
mixtral-8x22b-instruct-preview 0.85 0.08¢ 0.95 5.0 6.0 4.0 5.0 fireworks
command-r 0.85 0.08¢ 0.95 5.0 5.0 5.0 5.0 cohere
claude-instant-1.0 0.87 0.09¢ 0.93 6.0 5.0 4.0 5.0 anthropic
nous-hermes-2-mixtral-8x7b-sft 0.88 0.04¢ 0.95 4.0 6.0 4.0 4.7 nousresearch
nous-capybara-34b 0.93 0.06¢ 0.97 4.0 4.0 6.0 4.7 nousresearch
gemini-pro 0.85 0.07¢ 0.96 5.0 4.0 5.0 4.7 google
airoboros-l2-70b 0.87 0.07¢ 0.95 4.0 3.0 7.0 4.7 jondurbin
xwin-lm-70b 0.84 0.22¢ 0.96 4.0 6.0 4.0 4.7 xwin-lm
fimbulvetr-11b-v2 0.85 0.26¢ 0.97 5.0 4.0 5.0 4.7 sao10k
toppy-m-7b:free 0.90 0.00¢ 0.94 4.5 4.0 5.0 4.5 undi95
zephyr-7b-beta:free 0.88 0.00¢ 0.98 4.0 4.0 5.0 4.3 huggingfaceh4
mistral-7b-instruct 0.88 0.01¢ 0.93 3.0 7.0 3.0 4.3 mistralai
phi-3-mini-128k-instruct 0.88 0.01¢ 0.94 6.0 4.0 3.0 4.3 microsoft
mythomax-l2-13b:nitro 0.91 0.01¢ 0.92 4.0 3.0 6.0 4.3 gryphe
mythomax-l2-13b 0.91 0.01¢ 0.95 3.0 5.0 5.0 4.3 gryphe
openhermes-2.5-mistral-7b 0.87 0.02¢ 0.93 3.0 6.0 4.0 4.3 teknium
phi-3-medium-128k-instruct 0.92 0.07¢ 0.88 8.0 1.0 4.0 4.3 microsoft
llava-yi-34b 0.91 0.07¢ 0.93 3.0 3.0 7.0 4.3 liuhaotian
nous-hermes-yi-34b 0.88 0.11¢ 0.91 6.0 6.0 1.0 4.3 nousresearch
sonar-small-online 0.88 0.51¢ 0.94 5.0 6.0 2.0 4.3 perplexity
pplx-7b-online 0.88 0.52¢ 0.92 3.0 6.0 4.0 4.3 perplexity
noromaid-mixtral-8x7b-instruct 0.85 0.75¢ 0.95 3.0 5.0 5.0 4.3 neversleep
claude-2.1:beta 0.88 0.81¢ 0.99 5.0 4.0 4.0 4.3 anthropic
cinematika-7b:free 0.81 0.00¢ 1.00 4.0 4.0 4.0 4.0 openrouter
phi-3-mini-128k-instruct:free 0.88 0.00¢ 0.95 5.0 4.0 3.0 4.0 microsoft
toppy-m-7b:nitro 0.89 0.01¢ 0.96 3.0 4.0 5.0 4.0 undi95
mistral-7b-instruct-v0.2 0.89 0.01¢ 0.94 4.0 3.0 5.0 4.0 mistralai
qwen-7b-chat 0.88 0.01¢ 0.95 4.0 4.0 4.0 4.0 qwen
mythomist-7b 0.89 0.03¢ 0.94 5.0 3.0 4.0 4.0 gryphe
dolphin-mixtral-8x7b 0.90 0.03¢ 0.93 4.0 4.0 4.0 4.0 cognitivecomputations
yi-34b-chat 0.84 0.06¢ 1.00 4.0 4.0 4.0 4.0 01-ai
palm-2-codechat-bison-32k 0.80 0.07¢ 0.75 7.0 0.0 5.0 4.0 google
psyfighter-13b 0.83 0.07¢ 0.99 4.0 5.0 3.0 4.0 jebcarter
llama-2-70b-chat 0.89 0.16¢ 0.96 5.0 3.0 4.0 4.0 meta-llama
neural-chat-7b 0.86 0.31¢ 0.93 4.0 3.0 5.0 4.0 intel
bagel-34b 0.91 0.40¢ 0.96 4.0 3.0 5.0 4.0 jondurbin
nous-capybara-7b 0.87 0.01¢ 0.96 4.0 4.0 3.0 3.7 nousresearch
chronos-hermes-13b 0.91 0.02¢ 0.95 4.0 4.0 3.0 3.7 austism
palm-2-chat-bison 0.89 0.05¢ 0.97 5.0 3.0 3.0 3.7 google
codellama-34b-instruct 0.86 0.06¢ 0.95 3.0 4.0 4.0 3.7 meta-llama
remm-slerp-l2-13b:extended 0.88 0.07¢ 0.92 5.0 4.0 2.0 3.7 undi95
gpt-3.5-turbo-instruct 0.82 0.14¢ 0.95 4.0 5.0 2.0 3.7 openai
sonar-medium-online 0.85 0.55¢ 0.95 3.0 2.0 6.0 3.7 perplexity
gemma-7b-it 0.80 0.00¢ 0.97 3.0 3.0 4.0 3.3 google
mistral-7b-openorca 0.87 0.01¢ 0.94 4.0 3.0 3.0 3.3 open-orca
stripedhyena-nous-7b 0.88 0.01¢ 0.92 3.0 3.0 4.0 3.3 togethercomputer
pplx-7b-chat 0.84 0.02¢ 0.89 3.0 2.0 5.0 3.3 perplexity
mythomax-l2-13b:extended 0.90 0.02¢ 0.93 3.0 3.0 4.0 3.3 gryphe
nous-hermes-llama2-13b 0.90 0.02¢ 0.94 3.0 3.0 4.0 3.3 nousresearch
remm-slerp-l2-13b 0.91 0.02¢ 0.93 3.0 3.0 4.0 3.3 undi95
codellama-70b-instruct 0.84 0.06¢ 0.89 5.0 4.0 1.0 3.3 codellama
llama-2-70b-chat:nitro 0.89 0.07¢ 0.97 4.0 4.0 2.0 3.3 meta-llama
psyfighter-13b-2 0.88 0.07¢ 0.92 4.0 3.0 3.0 3.3 koboldai
synthia-70b 0.88 0.22¢ 0.94 4.0 3.0 3.0 3.3 migtissera
rwkv-5-world-3b 0.78 0.00¢ 1.00 3.0 3.0 3.0 3.0 rwkv
qwen-4b-chat 0.91 0.00¢ 0.92 3.0 3.0 3.0 3.0 qwen
cinematika-7b 0.82 0.01¢ 1.00 3.0 3.0 3.0 3.0 openrouter
openhermes-2-mistral-7b 0.90 0.01¢ 1.00 3.0 3.0 3.0 3.0 teknium
mixtral-8x22b 0.86 0.04¢ 0.94 6.0 0.0 3.0 3.0 mistralai
nous-capybara-7b:free 0.81 0.00¢ 0.92 2.0 2.0 4.0 2.7 nousresearch
gemma-7b-it:nitro 0.81 0.01¢ 0.95 3.0 2.0 3.0 2.7 google
firellava-13b 0.86 0.01¢ 0.89 3.0 1.0 4.0 2.7 fireworks
mistral-7b-instruct-v0.1 0.91 0.01¢ 0.95 3.0 2.0 3.0 2.7 mistralai
sonar-small-chat 0.88 0.01¢ 0.91 3.0 3.0 2.0 2.7 perplexity
mythalion-13b 0.91 0.06¢ 0.96 2.0 3.0 3.0 2.7 pygmalionai
dolphin-mixtral-8x22b 0.73 0.08¢ 0.88 1.0 2.0 5.0 2.7 cognitivecomputations
mythomist-7b:free 0.87 0.00¢ 0.94 3.0 2.0 2.0 2.3 gryphe
llama-2-13b-chat 0.82 0.02¢ 0.94 2.0 2.0 3.0 2.3 meta-llama
noromaid-20b 0.87 0.13¢ 0.90 4.0 2.0 1.0 2.3 neversleep
goliath-120b 0.84 0.56¢ 0.91 1.0 1.0 5.0 2.3 alpindale
olmo-7b-instruct 0.83 0.02¢ 0.91 1.5 1.0 4.0 2.2 allenai
llama-3-70b 0.70 0.05¢ 0.83 2.5 4.0 0.0 2.2 meta-llama
eagle-7b 0.72 0.00¢ 0.76 4.0 1.0 1.0 2.0 recursal
zephyr-7b-beta 0.85 0.01¢ 0.90 0.0 3.0 3.0 2.0 huggingfaceh4
command 0.65 0.11¢ 0.91 2.0 2.0 2.0 2.0 cohere
gemma-7b-it:free 0.73 0.00¢ 0.93 1.0 3.0 1.0 1.7 google
mistral-7b-instruct:free 0.86 0.00¢ 0.85 0.0 3.0 2.0 1.7 mistralai
stripedhyena-hessian-7b 0.67 0.01¢ 0.84 1.5 1.0 2.0 1.5 togethercomputer
mistral-small 0.76 0.11¢ 0.85 1.0 2.0 1.0 1.3 mistralai
weaver 0.56 0.22¢ 0.74 3.0 0.0 0.0 1.0 mancer
yi-6b 0.62 0.01¢ 0.79 2.0 0.0 0.0 0.7 01-ai
mixtral-8x7b 0.58 0.04¢ 0.76 0.0 0.0 2.0 0.7 mistralai
rwkv-5-3b-ai-town 0.66 0.00¢ 1.00 0.0 0.0 0.0 0.0 recursal
soliloquy-l3 0.33 0.00¢ 0.72 0.0 0.0 0.0 0.0 lynn
llama-guard-2-8b 0.18 0.00¢ 0.67 0.0 0.0 0.0 meta-llama
yi-34b 0.46 0.03¢ 0.67 0.0 0.0 0.0 0.0 01-ai