RankModelRating95% CI
1gemini-2.5-pro-preview 5/6 new 1385+9/-10
1gemini-2.5-pro-exp 3/25 1377+6/-7
1o3 4/16 1373+6/-6
4chatgpt-4o-latest 3/26 1362+7/-6
4gpt-4.5-preview 2/27 1357+4/-4
6gemini-2.5-flash-preview 4/17 1338+6/-7
6gpt-4.1 4/14 1330+7/-9
6grok-3-preview 2/24 1328+5/-4
6o4-mini 4/16 1325+9/-11
10deepseek-v3 3/24 open1323+5/-6
10o1 12/17 2024 1320+3/-3
10chatgpt-4o-latest 11/20 2024 1319+3/-3
10deepseek-r1 open1317+4/-4
14claude-3-7-sonnet-thinking-32k 2/19 1313+4/-5
15o1-preview 1303+4/-4
15claude-3-7-sonnet 2/19 1298+5/-4
15gpt-4.1-mini 4/14 1295+9/-7
18o3-mini-high 1287+5/-4
18claude-3-5-sonnet 10/22 2024 1286+3/-2
18qwen3-235b-a22b new open1285+11/-12
18gemini-2.0-flash v1 1285+3/-4
18gemma-3-27b-it open1282+5/-5
23deepseek-v3 open1277+4/-4
23gemini-2.0-flash-lite-preview 2/5 1270+4/-3
25o3-mini 1269+4/-4
25gemini-1.5-pro v2 1269+2/-2
25hunyuan-turbos 2/26 1266+10/-12
25gemma-3-12b-it new open1265+14/-11
25command-a Mar open1264+5/-5
30gpt-4o 5/13 2024 1264+2/-2
30claude-3-5-sonnet 6/20 2024 1260+2/-2
30hunyuan-turbo 1/10 1260+9/-9
30glm-4-plus 1/11 1257+7/-6
34qwq-32b open1256+6/-5
34o1-mini 1256+2/-2
34llama-3.1-405b-instruct-bf16 open1254+3/-3
34llama-3.1-405b-instruct-fp8 open1253+2/-2
34step-2-16k-exp Dec 2024 1252+8/-10
34gpt-4o 8/6 2024 1250+3/-2
34grok-2 8/13 2024 1249+2/-2
34llama-4-maverick-17b-128e-instruct open1249+7/-6
34llama-3.3-nemotron-49b-super v1 open1248+10/-11
43yi-lightning 1247+3/-3
43hunyuan-large 2/10 1243+10/-10
43gpt-4-turbo 4/9 2024 1242+2/-2
43gpt-4.1-nano 4/14 1240+7/-7
47yi-lightning-lite 1239+4/-4
47claude-3-opus 2/29 2024 1239+2/-1
47glm-4-plus 1236+3/-3
47llama-3.3-70b-instruct open1235+2/-3
47gpt-4o-mini 7/18 2024 1235+2/-2
47gpt-4-1106-preview 1235+2/-2
47claude-3-5-haiku 10/22 2024 1234+3/-2
54mistral-large Jul 2024 open1232+3/-2
54athene-v2-chat open1231+3/-3
54gpt-4-0125-preview 1231+2/-2
54hunyuan-standard 2/10 1230+9/-8
54gemini-1.5-flash v2 1230+4/-3
59grok-2-mini 8/13 2024 1225+2/-2
59athene-70b 7/25 open1223+3/-4
59qwen2.5-72b-instruct open1223+3/-2
59gemma-3-4b-it new open1222+9/-9
59mistral-large Nov 2024 open1222+3/-3
64llama-3.1-nemotron-70b-instruct open1217+6/-5
64llama-3.1-70b-instruct open1214+3/-2
64llama-3.1-tulu-3-70b open1209+9/-9
64amazon-nova-pro-v1.0 1208+3/-3
64reka-core 9/4 2024 1208+6/-6
69gemma-2-27b-it open1207+2/-2
69yi-large-preview 1207+3/-3
69jamba-1.5-large open1206+5/-5
69llama-3.1-nemotron-51b-instruct open1204+7/-9
73gemma-2-9b-it-simpo open1200+5/-7
73gpt-4 3/14 1200+2/-3
73claude-3-sonnet 2/29 2024 1197+2/-2
73command-r-plus Aug 2024 open1197+4/-4
73nemotron-4-340b-instruct open1196+4/-4
73llama-3-70b-instruct open1195+2/-2
73yi-large 1195+4/-4
73reka-flash 9/4 2024 1193+5/-6
73mistral-small-24b-instruct Jan open1192+5/-4
73qwen2.5-coder-32b-instruct open1190+6/-6
73glm-4 5/20 1189+5/-5
84gpt-4 6/13 1185+2/-2
84c4ai-aya-expanse-32b open1185+3/-3
84command-r-plus open1183+2/-2
84amazon-nova-lite-v1.0 1182+4/-4
88gemma-2-9b-it open1181+2/-3
88qwen2-72b-instruct open1181+3/-3
88gemini-1.5-flash-8b v1 1180+3/-2
88claude-3-haiku 3/7 2024 1178+2/-2
88phi-4 open1178+4/-4
88command-r Aug 2024 open1176+5/-5
88olmo-2-0325-32b-instruct new 1174+10/-12
95amazon-nova-micro-v1.0 1164+3/-4
95glm-4 1/16 1163+7/-6
95jamba-1.5-mini open1160+4/-5
95ministral-8b Oct 2024 open1160+8/-7
95claude-1 1159+4/-4
100mistral-large Feb 2024 1158+2/-2
100hunyuan-standard-256k 1153+10/-10
102reka-flash-21b-online 2/26 2024 1151+4/-4
102c4ai-aya-expanse-8b open1151+5/-4
102mixtral-8x22b-instruct v0.1 open1149+3/-2
102mistral-next 1149+5/-5
102command-r open1148+3/-3
102llama-3.1-tulu-3-8b open1148+7/-8
102claude-2.0 1147+5/-6
109llama-3-8b-instruct open1143+2/-2
109reka-flash-21b 2/26 2024 1142+4/-4
109gpt-3.5-turbo 3/14 1142+11/-9
109mistral-medium 1141+3/-3
113gpt-3.5-turbo 1/25 1137+2/-2
113gpt-3.5-turbo 6/13 1137+4/-2
113granite-3.1-8b-instruct open1135+10/-8
113claude-2.1 1135+3/-3
113yi-1.5-34b-chat open1134+3/-3
113llama-3.1-8b-instruct open1133+3/-2
113zephyr-orpo-141b-A35b v0.1 open1130+9/-8
120claude-instant-1 1127+4/-4
121phi-3-medium-4k-instruct open1118+4/-3
121gemma-2-2b-it open1118+3/-3
121internlm2_5-20b-chat open1116+5/-5
121mixtral-8x7b-instruct v0.1 open1114+2/-2
121dbrx-instruct-preview open1113+3/-2
121granite-3.0-8b-instruct open1109+6/-7
127gpt-3.5-turbo 11/6 1109+5/-5
127wizardlm-70b open1105+7/-6
127granite-3.1-2b-instruct open1104+8/-9
127snowflake-arctic-instruct open1102+3/-3
127yi-34b-chat open1101+4/-5
127openchat-3.5 1/6 open1100+4/-4
133phi-3-small-8k-instruct open1098+4/-4
133openchat-3.5 open1094+5/-5
133llama-3.2-3b-instruct open1094+6/-4
133starling-lm-7b-beta open1092+5/-4
133vicuna-33b open1091+4/-4
133openhermes-2.5-mistral-7b open1089+9/-8
139starling-lm-7b-alpha open1083+6/-5
139granite-3.0-2b-instruct open1082+5/-7
139pplx-70b-online 1076+7/-7
142nous-hermes-2-mixtral-8x7b-dpo open1072+6/-9
142dolphin-2.2.1-mistral-7b 1069+10/-12
142phi-3-mini-4k-instruct-june-2024 open1068+5/-4
142mistral-7b-instruct v0.2 open1067+4/-2
142solar-10.7b-instruct v1 1066+9/-8
142wizardlm-13b open1063+7/-8
142mpt-30b-chat open1060+10/-9
142falcon-180b-chat open1059+14/-15
150vicuna-13b open1058+4/-4
150phi-3-mini-4k-instruct open1056+5/-4
150smollm2-1.7b-instruct open1051+13/-12
150zephyr-7b-beta open1049+6/-5
154phi-3-mini-128k-instruct open1049+5/-3
154codellama-34b-instruct open1047+6/-6
154zephyr-7b-alpha open1046+12/-12
154llama-3.2-1b-instruct open1045+6/-6
154palm-2 1042+7/-7
154pplx-7b-online 1041+7/-6
154guanaco-33b open1040+10/-9
154codellama-70b-instruct open1038+16/-17
162stripedhyena-nous-7b open1034+8/-6
162mistral-7b-instruct open1026+7/-5
164vicuna-7b open1021+7/-6
165olmo-7b-instruct open994+8/-7
165koala-13b open980+8/-7
167chatglm3-6b open974+10/-6
167gpt4all-13b-snoozy open972+16/-10
167mpt-7b-chat open967+9/-10
167alpaca-13b 966+10/-7
171RWKV-4-Raven-14B open949+7/-9
171chatglm2-6b open942+12/-9
173oasst-pythia-12b open931+7/-7
174fastchat-t5-3b open896+10/-9
174chatglm-6b open888+10/-7
174dolly-v2-12b open881+9/-8
177stablelm-tuned-alpha-7b open859+11/-10

Settings

Visualize scores
Price ranges
Filter models

Remember: You need a 70 point difference for a 60% win rate