Language Model Board, a better way to read the LM Arena results

RankModelRating95% CI
1gemini-2.5-pro 1456+5/-5
1gpt-5-high 1447+7/-7
1claude-opus-4-1 8/5 16k 1447+7/-7
2o3 4/16 1444+4/-4
2chatgpt-4o-latest 3/26 1443+4/-4
2gpt-4.5-preview 2/27 1439+6/-6
1gpt-5-old 1439+21/-21
2claude-opus-4-1 8/5 1435+6/-6
7gpt-5-chat 1426+7/-7
7qwen-max 8/15 1425+8/-8
8grok-4 7/9 1422+6/-6
8kimi-k2-0711-preview open1421+5/-5
8claude-opus-4 5/14 16k 1419+5/-5
8qwen3-235b-a22b-instruct Jul open1418+6/-6
8deepseek-v3.1 1418+8/-8
8deepseek-r1 5/28 open1417+6/-6
8deepseek-v3.1 Yes 1415+9/-9
9mistral-medium Aug 1411+7/-7
10glm-4.5 open1410+6/-6
13claude-opus-4 5/14 1409+5/-5
14gpt-4.1 4/14 1409+5/-5
14grok-3-preview 2/24 1409+4/-4
16gemini-2.5-flash 1405+5/-5
17qwen3-235b-a22b Jul Yes open1400+7/-7
22o1 12/17 2024 1399+4/-4
17mai-1-preview 1399+9/-9
22qwen3-235b-a22b No open1398+5/-5
23claude-sonnet-4 5/14 32k 1398+5/-5
24deepseek-r1 open1394+5/-5
24o4-mini 4/16 1394+5/-5
24deepseek-v3 3/24 open1392+4/-4
24gpt-5-mini-high 1390+7/-7
24hunyuan-t1 7/11 1388+8/-8
31claude-3-7-sonnet 2/19 32k 1385+4/-4
24qwen-vl-max 8/13 1384+15/-15
31mistral-medium May 1384+5/-5
31claude-sonnet-4 5/14 1383+5/-5
32qwen3-coder-480b-a35b-instruct open1381+6/-6
32hunyuan-turbos 4/16 1381+6/-6
32qwen3-30b-a3b-instruct Jul 1380+7/-7
32gpt-4.1-mini 4/14 1379+5/-5
32glm-4.5-air open1377+6/-6
35qwen3-235b-a22b open1374+5/-5
41claude-3-7-sonnet 2/19 1369+4/-4
40minimax-m1 open1368+5/-5
41claude-3-5-sonnet 10/22 2024 1368+3/-3
44gemma-3-27b-it open1364+4/-4
44o3-mini-high 1363+5/-5
44gemini-2.0-flash v1 1362+4/-4
44grok-3-mini-high 1362+6/-6
46grok-3-mini-beta 1359+5/-5
47deepseek-v3 open1357+5/-5
47gpt-oss-120b 1355+7/-7
47mistral-small Jun open1355+6/-6
46step-3 1353+10/-10
51gemini-2.0-flash-lite-preview 2/5 1352+4/-4
51gemini-1.5-pro v2 1350+3/-3
50gpt-5-nano-high 1348+9/-9
53o3-mini 1348+4/-4
53command-a Mar open1347+4/-4
50hunyuan-turbos 2/26 1347+11/-11
51qwen3-32b open1346+9/-9
51llama-3.1-nemotron-ultra-253b v1 1345+11/-11
56gpt-4o 5/13 2024 1344+3/-3
49glm-4.5v 1342+16/-16
53glm-4-plus 1/11 1342+8/-8
53nvidia-llama-3.3-nemotron-super-49b v1.5 open1342+9/-9
58claude-3-5-sonnet 6/20 2024 1341+3/-3
53gemma-3-12b-it open1341+9/-9
53hunyuan-turbo 1/10 1340+11/-11
60qwq-32b open1337+5/-5
62o1-mini 1336+3/-3
62llama-3.1-405b-instruct-bf16 open1335+4/-4
63gpt-4o 8/6 2024 1334+4/-4
63llama-3.1-405b-instruct-fp8 open1334+3/-3
64grok-2 8/13 2024 1333+3/-3
61step-2-16k-exp Dec 2024 1332+8/-8
64gpt-oss-20b 1329+7/-7
65qwen3-30b-a3b open1329+5/-5
68yi-lightning 1328+5/-5
72llama-4-maverick-17b-128e-instruct open1326+5/-5
63llama-3.3-nemotron-49b-super v1 open1325+12/-12
65hunyuan-large 2/10 1325+10/-10
76gpt-4-turbo 4/9 2024 1324+4/-4
75step-1o-turbo Jun 1323+6/-6
77claude-3-opus 2/29 2024 1323+3/-3
76gpt-4.1-nano 4/14 1321+8/-8
77amazon-nova-experimental-chat 5/14 1320+5/-5
79llama-3.3-70b-instruct open1320+3/-3
77llama-4-scout-17b-16e-instruct open1320+5/-5
81claude-3-5-haiku 10/22 2024 1318+3/-3
80glm-4-plus 1318+5/-5
79gemma-3n-e4b-it 1318+5/-5
84gpt-4o-mini 7/18 2024 1316+3/-3
84gpt-4-1106-preview 1315+4/-4
84gpt-4-0125-preview 1315+4/-4
84athene-v2-chat open1314+4/-4
84mistral-large Jul 2024 open1314+4/-4
86gemini-1.5-flash v2 1312+4/-4
84hunyuan-standard 2/10 1310+10/-10
98grok-2-mini 8/13 2024 1307+3/-3
98mistral-large Nov 2024 open1305+4/-4
97athene-70b 7/25 open1305+5/-5
94gemma-3-4b-it open1304+9/-9
100qwen2.5-72b-instruct open1303+4/-4
100magistral-medium Jun 1301+7/-7
100mistral-small-3.1-24b-instruct Mar open1301+5/-5
100llama-3.1-nemotron-70b-instruct open1298+7/-7
100hunyuan-large-vision 1296+9/-9
105llama-3.1-70b-instruct open1295+3/-3
107amazon-nova-pro-v1.0 1290+4/-4
105jamba-1.5-large open1289+7/-7
104llama-3.1-tulu-3-70b open1289+10/-10
105reka-core 9/4 2024 1289+7/-7
108gpt-4 3/14 1288+5/-5
110gemma-2-27b-it open1287+3/-3
105llama-3.1-nemotron-51b-instruct open1287+10/-10
111gemma-2-9b-it-simpo open1280+7/-7
112nemotron-4-340b-instruct open1280+5/-5
111command-r-plus Aug 2024 open1279+6/-6
116llama-3-70b-instruct open1277+3/-3
116gpt-4 6/13 1276+4/-4
114glm-4 5/20 1276+7/-7
115reka-flash 9/4 2024 1276+7/-7
116mistral-small-24b-instruct Jan open1276+6/-6
116qwen2.5-coder-32b-instruct open1272+8/-8
121c4ai-aya-expanse-32b open1269+5/-5
124command-r-plus open1266+4/-4
126gemma-2-9b-it open1265+4/-4
124qwen2-72b-instruct open1265+5/-5
126claude-3-haiku 3/7 2024 1263+4/-4
126amazon-nova-lite-v1.0 1262+5/-5
126gemini-1.5-flash-8b v1 1262+4/-4
128phi-4 open1259+4/-4
126olmo-2-0325-32b-instruct 1256+11/-11
129command-r Aug 2024 open1255+6/-6
135mistral-large Feb 2024 1245+5/-5
135amazon-nova-micro-v1.0 1245+5/-5
136jamba-1.5-mini open1241+7/-7
135ministral-8b Oct 2024 open1241+9/-9
136hunyuan-standard-256k 1237+11/-11
137reka-flash-21b-online 2/26 2024 1236+7/-7
139mixtral-8x22b-instruct v0.1 open1233+4/-4
139command-r open1232+5/-5
139reka-flash-21b 2/26 2024 1230+6/-6
139c4ai-aya-expanse-8b open1228+7/-7
140mistral-medium 1227+5/-5
141gpt-3.5-turbo 1/25 1226+5/-5
141llama-3-8b-instruct open1226+3/-3
139llama-3.1-tulu-3-8b open1225+10/-10
146yi-1.5-34b-chat open1218+5/-5
143zephyr-orpo-141b-A35b v0.1 open1217+10/-10
150llama-3.1-8b-instruct open1215+4/-4
145granite-3.1-8b-instruct open1214+10/-10
152gpt-3.5-turbo 11/6 1204+9/-9
153phi-3-medium-4k-instruct open1202+5/-5
154mixtral-8x7b-instruct v0.1 open1202+4/-4
153internlm2_5-20b-chat open1199+7/-7
154dbrx-instruct-preview open1199+6/-6
155wizardlm-70b open1189+9/-9
157granite-3.0-8b-instruct open1189+8/-8
158yi-34b-chat open1188+7/-7
158openchat-3.5 1/6 open1187+8/-8
158openchat-3.5 open1185+10/-10
157granite-3.1-2b-instruct open1185+11/-11
160snowflake-arctic-instruct open1184+6/-6
160openhermes-2.5-mistral-7b open1179+10/-10
160vicuna-33b open1178+6/-6
160starling-lm-7b-beta open1177+7/-7
160phi-3-small-8k-instruct open1177+6/-6
161starling-lm-7b-alpha open1173+8/-8
162llama-3.2-3b-instruct open1173+7/-7
160nous-hermes-2-mixtral-8x7b-dpo open1171+12/-12
168granite-3.0-2b-instruct open1163+8/-8
167solar-10.7b-instruct v1 1159+13/-13
167dolphin-2.2.1-mistral-7b 1158+15/-15
173mistral-7b-instruct v0.2 open1156+6/-6
171mpt-30b-chat open1155+12/-12
173wizardlm-13b open1155+9/-9
171falcon-180b-chat open1150+17/-17
175phi-3-mini-4k-instruct-june-2024 open1149+6/-6
175vicuna-13b open1146+7/-7
175codellama-34b-instruct open1142+9/-9
176palm-2 1139+9/-9
178phi-3-mini-128k-instruct open1137+7/-7
178zephyr-7b-beta open1137+9/-9
181phi-3-mini-4k-instruct open1135+6/-6
176zephyr-7b-alpha open1133+16/-16
178guanaco-33b open1132+12/-12
178smollm2-1.7b-instruct open1130+13/-13
179codellama-70b-instruct open1125+18/-18
182stripedhyena-nous-7b open1125+11/-11
186llama-3.2-1b-instruct open1122+7/-7
187vicuna-7b open1119+9/-9
188mistral-7b-instruct open1115+9/-9
196olmo-7b-instruct open1080+11/-11
196koala-13b open1075+10/-10
196gpt4all-13b-snoozy open1067+15/-15
196alpaca-13b 1067+11/-11
196mpt-7b-chat open1065+12/-12
196chatglm3-6b open1060+12/-12
198RWKV-4-Raven-14B open1045+11/-11
202chatglm2-6b open1031+13/-13
202oasst-pythia-12b open1025+11/-11
205chatglm-6b open1001+13/-13
205fastchat-t5-3b open995+12/-12
205dolly-v2-12b open980+14/-14
207stablelm-tuned-alpha-7b open956+13/-13

Settings

Visualize scores
Price ranges
Drop models

Remember: You need a 70 point difference for a 60% win rate