Language Model Board, a better way to read the LM Arena results

RankModelRating95% CI
1gemini-2.5-pro 1459+6/-6
1o3 4/16 1451+5/-5
2chatgpt-4o-latest 3/26 1442+5/-5
3gpt-4.5-preview 2/27 1438+6/-6
3qwen3-235b-a22b-instruct Jul open1429+9/-9
4grok-4 7/9 1428+6/-6
5kimi-k2-0711-preview open1420+6/-6
5claude-opus-4 5/14 16k 1419+6/-6
5deepseek-r1 5/28 open1419+6/-6
5glm-4.5 open1418+9/-9
7claude-opus-4 5/14 1412+5/-5
9grok-3-preview 2/24 1409+5/-5
10gpt-4.1 4/14 1408+5/-5
10gemini-2.5-flash 1408+5/-5
12claude-sonnet-4 5/14 32k 1399+6/-6
11qwen3-235b-a22b Jul Yes open1399+10/-10
13qwen3-235b-a22b No open1398+5/-5
15chatgpt-4o-latest 11/20 2024 1398+4/-4
15o1 12/17 2024 1398+4/-4
14o4-mini 4/16 1397+5/-5
15deepseek-r1 open1393+5/-5
15deepseek-v3 3/24 open1391+5/-5
20claude-sonnet-4 5/14 1386+6/-6
20claude-3-7-sonnet 2/19 32k 1385+5/-5
22mistral-medium May 1382+5/-5
20glm-4.5-air open1382+9/-9
21qwen3-coder-480b-a35b-instruct open1381+8/-8
22hunyuan-turbos 4/16 1381+6/-6
20qwen3-30b-a3b-instruct Jul 1377+12/-12
25minimax-m1 open1373+6/-6
25gpt-4.1-mini 4/14 1373+5/-5
26qwen3-235b-a22b open1372+5/-5
29claude-3-7-sonnet 2/19 1368+4/-4
29claude-3-5-sonnet 10/22 2024 1365+3/-3
29gemini-2.0-flash v1 1364+4/-4
29gemma-3-27b-it open1363+4/-4
29o3-mini-high 1363+5/-5
33grok-3-mini-high 1358+7/-7
36deepseek-v3 open1356+5/-5
35grok-3-mini-beta 1355+6/-6
38gemini-2.0-flash-lite-preview 2/5 1351+4/-4
38gemini-1.5-pro v2 1350+3/-3
38mistral-small Jun open1349+7/-7
40command-a Mar open1346+5/-5
38qwen3-32b open1346+9/-9
38hunyuan-turbos 2/26 1346+11/-11
38llama-3.1-nemotron-ultra-253b v1 1344+12/-12
41o3-mini 1344+4/-4
43gpt-4o 5/13 2024 1343+3/-3
38nvidia-llama-3.3-nemotron-super-49b v1.5 open1342+10/-10
40glm-4-plus 1/11 1341+8/-8
40gemma-3-12b-it open1340+9/-9
40hunyuan-turbo 1/10 1339+11/-11
44claude-3-5-sonnet 6/20 2024 1339+3/-3
46qwq-32b open1335+5/-5
47o1-mini 1334+3/-3
47llama-3.1-405b-instruct-bf16 open1334+4/-4
48llama-3.1-405b-instruct-fp8 open1332+3/-3
48gpt-4o 8/6 2024 1332+4/-4
50grok-2 8/13 2024 1331+3/-3
47step-2-16k-exp Dec 2024 1331+8/-8
49llama-4-maverick-17b-128e-instruct open1330+5/-5
50qwen3-30b-a3b open1328+5/-5
53yi-lightning 1327+5/-5
48llama-3.3-nemotron-49b-super v1 open1324+12/-12
50hunyuan-large 2/10 1324+10/-10
61gpt-4-turbo 4/9 2024 1323+4/-4
61yi-lightning-lite 1322+5/-5
62claude-3-opus 2/29 2024 1321+3/-3
61gpt-4.1-nano 4/14 1320+8/-8
65glm-4-plus 1317+5/-5
62amazon-nova-experimental-chat 5/14 1317+6/-6
65llama-3.3-70b-instruct open1316+4/-4
64llama-4-scout-17b-16e-instruct open1316+6/-6
64gemma-3n-e4b-it 1316+6/-6
65claude-3-5-haiku 10/22 2024 1316+3/-3
66gpt-4o-mini 7/18 2024 1315+3/-3
67gpt-4-1106-preview 1314+4/-4
67gpt-4-0125-preview 1314+4/-4
67athene-v2-chat open1313+4/-4
68mistral-large Jul 2024 open1312+4/-4
68gemini-1.5-flash v2 1311+4/-4
66hunyuan-standard 2/10 1308+10/-10
80grok-2-mini 8/13 2024 1305+4/-4
81mistral-large Nov 2024 open1304+4/-4
80athene-70b 7/25 open1304+5/-5
72gemma-3-4b-it open1303+9/-9
83qwen2.5-72b-instruct open1301+4/-4
83magistral-medium Jun 1299+7/-7
83llama-3.1-nemotron-70b-instruct open1297+8/-8
83hunyuan-large-vision 1296+9/-9
87llama-3.1-70b-instruct open1294+4/-4
85mistral-small-3.1-24b-instruct Mar open1293+6/-6
89amazon-nova-pro-v1.0 1288+4/-4
87llama-3.1-tulu-3-70b open1288+10/-10
89yi-large-preview 1288+5/-5
88jamba-1.5-large open1288+7/-7
88reka-core 9/4 2024 1287+7/-7
90gpt-4 3/14 1286+5/-5
88llama-3.1-nemotron-51b-instruct open1286+10/-10
92gemma-2-27b-it open1285+3/-3
94gemma-2-9b-it-simpo open1279+7/-7
95nemotron-4-340b-instruct open1279+5/-5
94command-r-plus Aug 2024 open1278+6/-6
100llama-3-70b-instruct open1275+3/-3
97glm-4 5/20 1275+7/-7
100mistral-small-24b-instruct Jan open1274+6/-6
98reka-flash 9/4 2024 1274+7/-7
100gpt-4 6/13 1274+4/-4
100qwen2.5-coder-32b-instruct open1270+8/-8
104c4ai-aya-expanse-32b open1268+5/-5
108command-r-plus open1264+4/-4
110gemma-2-9b-it open1264+4/-4
108qwen2-72b-instruct open1264+5/-5
110claude-3-haiku 3/7 2024 1261+4/-4
110amazon-nova-lite-v1.0 1261+5/-5
110gemini-1.5-flash-8b v1 1261+4/-4
112phi-4 open1257+4/-4
110olmo-2-0325-32b-instruct 1256+11/-11
112command-r Aug 2024 open1254+6/-6
116claude-1 1251+7/-7
119amazon-nova-micro-v1.0 1243+5/-5
120mistral-large Feb 2024 1243+5/-5
119glm-4 1/16 1241+9/-9
120jamba-1.5-mini open1240+7/-7
119mistral-next 1240+9/-9
120ministral-8b Oct 2024 open1239+9/-9
120hunyuan-standard-256k 1236+11/-11
122reka-flash-21b-online 2/26 2024 1235+7/-7
121gpt-3.5-turbo 3/14 1233+11/-11
124mixtral-8x22b-instruct v0.1 open1231+4/-4
124command-r open1230+5/-5
124reka-flash-21b 2/26 2024 1228+6/-6
124gpt-3.5-turbo 6/13 1227+6/-6
124c4ai-aya-expanse-8b open1227+7/-7
126mistral-medium 1226+5/-5
124llama-3.1-tulu-3-8b open1225+10/-10
128llama-3-8b-instruct open1224+4/-4
128gpt-3.5-turbo 1/25 1224+5/-5
133yi-1.5-34b-chat open1217+5/-5
131zephyr-orpo-141b-A35b v0.1 open1215+11/-11
139llama-3.1-8b-instruct open1214+4/-4
135claude-instant-1 1214+7/-7
132granite-3.1-8b-instruct open1213+11/-11
142gpt-3.5-turbo 11/6 1201+9/-9
143phi-3-medium-4k-instruct open1201+5/-5
144mixtral-8x7b-instruct v0.1 open1200+4/-4
143internlm2_5-20b-chat open1199+7/-7
144dbrx-instruct-preview open1197+6/-6
145wizardlm-70b open1187+9/-9
147granite-3.0-8b-instruct open1187+8/-8
147yi-34b-chat open1187+7/-7
147openchat-3.5 1/6 open1185+8/-8
147granite-3.1-2b-instruct open1183+11/-11
148openchat-3.5 open1183+10/-10
150snowflake-arctic-instruct open1182+6/-6
150openhermes-2.5-mistral-7b open1177+10/-10
150vicuna-33b open1176+6/-6
150starling-lm-7b-beta open1176+7/-7
150phi-3-small-8k-instruct open1175+6/-6
151llama-3.2-3b-instruct open1171+7/-7
151starling-lm-7b-alpha open1171+8/-8
150nous-hermes-2-mixtral-8x7b-dpo open1169+12/-12
159granite-3.0-2b-instruct open1161+8/-8
159pplx-70b-online 1159+10/-10
157solar-10.7b-instruct v1 1157+13/-13
157dolphin-2.2.1-mistral-7b 1156+15/-15
163mistral-7b-instruct v0.2 open1154+6/-6
161mpt-30b-chat open1153+12/-12
163wizardlm-13b open1152+9/-9
161falcon-180b-chat open1148+17/-17
164phi-3-mini-4k-instruct-june-2024 open1148+6/-6
165vicuna-13b open1144+7/-7
166codellama-34b-instruct open1140+9/-9
167palm-2 1136+9/-9
169phi-3-mini-128k-instruct open1135+7/-7
168zephyr-7b-beta open1135+9/-9
172phi-3-mini-4k-instruct open1134+6/-6
167zephyr-7b-alpha open1131+16/-16
169guanaco-33b open1130+12/-12
169smollm2-1.7b-instruct open1129+13/-13
173pplx-7b-online 1126+10/-10
170codellama-70b-instruct open1124+18/-18
173stripedhyena-nous-7b open1123+11/-11
175llama-3.2-1b-instruct open1121+7/-7
179vicuna-7b open1117+9/-9
179mistral-7b-instruct open1112+9/-9
188olmo-7b-instruct open1079+11/-11
188koala-13b open1072+10/-10
188gpt4all-13b-snoozy open1064+15/-15
188alpaca-13b 1063+11/-11
188mpt-7b-chat open1062+12/-12
188chatglm3-6b open1058+12/-12
190RWKV-4-Raven-14B open1043+11/-11
194chatglm2-6b open1028+13/-13
194oasst-pythia-12b open1022+11/-11
197chatglm-6b open998+13/-13
197fastchat-t5-3b open992+12/-12
197dolly-v2-12b open977+14/-14
199stablelm-tuned-alpha-7b open954+13/-13

Settings

Visualize scores
Price ranges
Drop models

Remember: You need a 70 point difference for a 60% win rate