RankModelRating95% CI
1gemini-2.5-pro-exp-03-25 1379+7/-7
2chatgpt-4o-latest-20250326 1360+6/-7
2gpt-4.5-preview-2025-02-27 1356+5/-6
4grok-3-preview-02-24 1327+5/-5
4deepseek-v3-0324 open 1321+8/-7
4o1-2024-12-17 1320+4/-4
4chatgpt-4o-latest-20241120 1318+4/-4
8deepseek-r1 open 1316+6/-6
8claude-3-7-sonnet-20250219-thinking-32k 1314+6/-7
8gemini-2.0-flash-thinking-exp-01-21 1308+4/-5
11o1-preview 1303+3/-4
11claude-3-7-sonnet-20250219 1300+5/-7
13o3-mini-high 1287+5/-6
13qwen2.5-max 1286+4/-6
13claude-3-5-sonnet-20241022 1285+3/-3
13gemini-2.0-flash-001 1283+5/-5
13gemma-3-27b-it open 1281+6/-6
18deepseek-v3 open 1277+4/-5
18o3-mini 1269+5/-5
20gemini-1.5-pro-002 1268+3/-3
20gemini-2.0-flash-lite-preview-02-05 1268+5/-5
20hunyuan-turbos-20250226 1265+10/-12
20command-a-03-2025 open 1263+7/-8
20gpt-4o-2024-05-13 1263+2/-3
20qwen-plus-0125 1262+7/-7
26claude-3-5-sonnet-20240620 1260+2/-3
26hunyuan-turbo-0110 1258+9/-11
26qwq-32b open 1258+6/-8
26glm-4-plus-0111 1256+8/-7
26o1-mini 1255+3/-4
26llama-3.1-405b-instruct-bf16 open 1254+4/-4
32llama-3.1-405b-instruct-fp8 open 1253+3/-3
32llama-4-maverick-17b-128e-instruct open new1252+9/-10
32step-2-16k-exp-202412 1251+8/-8
32gpt-4o-2024-08-06 1250+3/-3
32grok-2-2024-08-13 1249+3/-4
37yi-lightning 1246+4/-4
37llama-3.3-nemotron-49b-super-v1 open 1246+12/-14
37gpt-4-turbo-2024-04-09 1242+2/-3
37hunyuan-large-2025-02-10 1242+9/-9
37yi-lightning-lite 1239+5/-5
42claude-3-opus-20240229 1239+2/-3
42llama-3.3-70b-instruct open 1235+3/-4
42glm-4-plus 1235+3/-4
42gpt-4-1106-preview 1235+3/-3
42gpt-4o-mini-2024-07-18 1234+3/-3
42claude-3-5-haiku-20241022 1233+5/-4
48mistral-large-2407 open 1232+3/-3
48qwen2.5-plus-1127 1231+5/-6
48qwen-max-0919 1231+4/-5
48gpt-4-0125-preview 1231+3/-3
48athene-v2-chat open 1231+5/-4
48gemini-1.5-flash-002 1229+4/-4
48hunyuan-standard-2025-02-10 1229+10/-7
55grok-2-mini-2024-08-13 1224+3/-4
55athene-70b-0725 open 1223+4/-6
55qwen2.5-72b-instruct open 1222+4/-4
55mistral-large-2411 open 1221+4/-4
55llama-3.1-nemotron-70b-instruct open 1217+5/-6
60llama-3.1-70b-instruct open 1214+3/-3
60llama-3.1-tulu-3-70b open 1209+10/-8
60amazon-nova-pro-v1.0 1209+6/-5
60reka-core-20240904 1207+8/-7
64gemma-2-27b-it open 1207+3/-3
64yi-large-preview 1207+3/-3
64jamba-1.5-large open 1205+5/-7
64llama-3.1-nemotron-51b-instruct open 1203+11/-9
64gemma-2-9b-it-simpo open 1200+6/-6
69gpt-4-0314 1200+3/-3
69claude-3-sonnet-20240229 1197+2/-2
69command-r-plus-08-2024 open 1197+5/-6
69nemotron-4-340b-instruct open 1196+4/-5
69llama-3-70b-instruct open 1195+2/-3
69yi-large 1195+5/-5
69reka-flash-20240904 1193+8/-7
76mistral-small-24b-instruct-2501 open 1190+5/-5
76qwen2.5-coder-32b-instruct open 1190+6/-8
76glm-4-0520 1188+5/-6
76gpt-4-0613 1185+2/-3
76c4ai-aya-expanse-32b open 1185+4/-4
76command-r-plus open 1183+2/-3
76amazon-nova-lite-v1.0 1182+4/-5
83qwen2-72b-instruct open 1181+3/-3
83gemma-2-9b-it open 1180+3/-3
83gemini-1.5-flash-8b-001 1179+4/-4
83claude-3-haiku-20240307 1178+2/-3
83phi-4 open 1176+5/-5
83command-r-08-2024 open 1176+5/-7
89qwen-max-0428 1172+3/-4
90amazon-nova-micro-v1.0 1164+4/-6
90glm-4-0116 1163+7/-7
90jamba-1.5-mini open 1160+7/-6
90ministral-8b-2410 open 1160+9/-7
90claude-1 1159+4/-4
90mistral-large-2402 1158+3/-3
90hunyuan-standard-256k 1153+11/-10
97reka-flash-21b-20240226-online 1151+4/-5
97c4ai-aya-expanse-8b open 1151+5/-6
97mixtral-8x22b-instruct-v0.1 open 1149+2/-3
97mistral-next 1149+6/-6
97command-r open 1148+2/-3
97llama-3.1-tulu-3-8b open 1147+11/-10
97claude-2.0 1146+4/-5
104llama-3-8b-instruct open 1143+2/-3
104reka-flash-21b-20240226 1142+4/-4
104gpt-3.5-turbo-0314 1142+9/-8
104mistral-medium 1141+3/-4
104gpt-3.5-turbo-0125 1137+3/-3
109gpt-3.5-turbo-0613 1137+3/-4
109claude-2.1 1135+4/-4
109granite-3.1-8b-instruct open 1135+8/-12
109yi-1.5-34b-chat open 1133+3/-4
109llama-3.1-8b-instruct open 1133+3/-3
109zephyr-orpo-141b-A35b-v0.1 open 1130+10/-7
115claude-instant-1 1127+4/-5
116phi-3-medium-4k-instruct open 1118+4/-4
116gemma-2-2b-it open 1118+3/-4
116internlm2_5-20b-chat open 1116+6/-5
116mixtral-8x7b-instruct-v0.1 open 1114+0/-0
116dbrx-instruct-preview open 1113+4/-3
121gpt-3.5-turbo-1106 1109+4/-5
121granite-3.0-8b-instruct open 1109+7/-7
121wizardlm-70b open 1105+7/-6
121granite-3.1-2b-instruct open 1104+10/-9
121snowflake-arctic-instruct open 1102+3/-3
121yi-34b-chat open 1101+4/-4
121openchat-3.5-0106 open 1100+5/-5
128phi-3-small-8k-instruct open 1098+4/-5
128openchat-3.5 open 1094+5/-6
128llama-3.2-3b-instruct open 1093+8/-5
128starling-lm-7b-beta open 1092+5/-4
128vicuna-33b open 1091+4/-4
128openhermes-2.5-mistral-7b open 1090+8/-7
128qwq-32b-preview open 1088+10/-10
135starling-lm-7b-alpha open 1083+4/-5
135granite-3.0-2b-instruct open 1081+8/-7
135pplx-70b-online 1076+6/-7
135nous-hermes-2-mixtral-8x7b-dpo open 1072+10/-10
135dolphin-2.2.1-mistral-7b 1069+12/-12
140phi-3-mini-4k-instruct-june-2024 open 1068+5/-5
140mistral-7b-instruct-v0.2 open 1067+4/-4
140solar-10.7b-instruct-v1.0 1066+6/-8
140wizardlm-13b open 1063+7/-6
140mpt-30b-chat open 1059+11/-9
140falcon-180b-chat open 1059+11/-15
140vicuna-13b open 1058+5/-5
147phi-3-mini-4k-instruct open 1056+4/-4
147smollm2-1.7b-instruct open 1051+12/-14
147zephyr-7b-beta open 1049+5/-7
147phi-3-mini-128k-instruct open 1049+4/-4
147codellama-34b-instruct open 1047+6/-6
147zephyr-7b-alpha open 1046+12/-12
153llama-3.2-1b-instruct open 1044+8/-6
153palm-2 1042+6/-6
153pplx-7b-online 1041+6/-7
153guanaco-33b open 1040+9/-12
153codellama-70b-instruct open 1039+15/-15
153stripedhyena-nous-7b open 1034+6/-8
159mistral-7b-instruct open 1026+6/-6
159vicuna-7b open 1021+7/-6
161olmo-7b-instruct open 993+6/-6
162koala-13b open 980+7/-7
162chatglm3-6b open 974+10/-8
162gpt4all-13b-snoozy open 973+13/-15
162mpt-7b-chat open 967+10/-11
162alpaca-13b 967+8/-7
167RWKV-4-Raven-14B open 949+9/-8
167chatglm2-6b open 942+10/-12
169oasst-pythia-12b open 931+8/-9
170fastchat-t5-3b open 896+9/-10
170chatglm-6b open 888+9/-9
170dolly-v2-12b open 881+9/-10
173stablelm-tuned-alpha-7b open 859+13/-10

Settings

Visualize scores
Price ranges
Filter models

Remember: You need a 70 point difference for a 60% win rate