Language Model Board, a better way to read the LM Arena results

RankModelRating95% CI
1gemini-2.5-pro 1456+5/-5
1gpt-5-high 1447+7/-7
1claude-opus-4-1 8/5 16k 1447+7/-7
2o3 4/16 1444+4/-4
2chatgpt-4o-latest 3/26 1443+4/-4
2gpt-4.5-preview 2/27 1439+6/-6
2claude-opus-4-1 8/5 1435+6/-6
7gpt-5-chat 1426+7/-7
7qwen-max 8/15 1425+8/-8
8grok-4 7/9 1422+6/-6
8kimi-k2-0711-preview open1421+5/-5
8claude-opus-4 5/14 16k 1419+5/-5
8qwen3-235b-a22b-instruct Jul open1418+6/-6
8deepseek-v3.1 open1418+8/-8
8deepseek-r1 5/28 open1417+6/-6
8deepseek-v3.1 Yes open1415+9/-9
9mistral-medium Aug 1411+7/-7
9glm-4.5 open1410+6/-6
12claude-opus-4 5/14 1409+5/-5
13gpt-4.1 4/14 1409+5/-5
13grok-3-preview 2/24 1409+4/-4
15gemini-2.5-flash 1405+5/-5
16qwen3-235b-a22b Jul Yes open1400+7/-7
21o1 12/17 2024 1399+4/-4
16mai-1-preview 1399+9/-9
21qwen3-235b-a22b No open1398+5/-5
22claude-sonnet-4 5/14 32k 1398+5/-5
23deepseek-r1 open1394+5/-5
23o4-mini 4/16 1394+5/-5
23deepseek-v3 3/24 open1392+4/-4
23gpt-5-mini-high 1390+7/-7
23hunyuan-t1 7/11 1388+8/-8
30claude-3-7-sonnet 2/19 32k 1385+4/-4
23qwen-vl-max 8/13 1384+15/-15
30mistral-medium May 1384+5/-5
30claude-sonnet-4 5/14 1383+5/-5
31qwen3-coder-480b-a35b-instruct open1381+6/-6
31hunyuan-turbos 4/16 1381+6/-6
31qwen3-30b-a3b-instruct Jul 1380+7/-7
31gpt-4.1-mini 4/14 1379+5/-5
31glm-4.5-air open1377+6/-6
34qwen3-235b-a22b open1374+5/-5
40claude-3-7-sonnet 2/19 1369+4/-4
39minimax-m1 open1368+5/-5
40claude-3-5-sonnet 10/22 2024 1368+3/-3
43gemma-3-27b-it open1364+4/-4
43o3-mini-high 1363+5/-5
43gemini-2.0-flash v1 1362+4/-4
43grok-3-mini-high 1362+6/-6
45grok-3-mini-beta 1359+5/-5
46deepseek-v3 open1357+5/-5
46gpt-oss-120b open1355+7/-7
46mistral-small Jun open1355+6/-6
45step-3 1353+10/-10
50gemini-2.0-flash-lite-preview 2/5 1352+4/-4
50gemini-1.5-pro v2 1350+3/-3
49gpt-5-nano-high 1348+9/-9
52o3-mini 1348+4/-4
52command-a Mar open1347+4/-4
49hunyuan-turbos 2/26 1347+11/-11
50qwen3-32b open1346+9/-9
50llama-3.1-nemotron-ultra-253b v1 1345+11/-11
55gpt-4o 5/13 2024 1344+3/-3
48glm-4.5v 1342+16/-16
52glm-4-plus 1/11 1342+8/-8
52nvidia-llama-3.3-nemotron-super-49b v1.5 open1342+9/-9
57claude-3-5-sonnet 6/20 2024 1341+3/-3
52gemma-3-12b-it open1341+9/-9
52hunyuan-turbo 1/10 1340+11/-11
59qwq-32b open1337+5/-5
61o1-mini 1336+3/-3
61llama-3.1-405b-instruct-bf16 open1335+4/-4
62gpt-4o 8/6 2024 1334+4/-4
62llama-3.1-405b-instruct-fp8 open1334+3/-3
63grok-2 8/13 2024 1333+3/-3
60step-2-16k-exp Dec 2024 1332+8/-8
63gpt-oss-20b open1329+7/-7
64qwen3-30b-a3b open1329+5/-5
67yi-lightning 1328+5/-5
71llama-4-maverick-17b-128e-instruct open1326+5/-5
62llama-3.3-nemotron-49b-super v1 open1325+12/-12
64hunyuan-large 2/10 1325+10/-10
75gpt-4-turbo 4/9 2024 1324+4/-4
74step-1o-turbo Jun 1323+6/-6
76claude-3-opus 2/29 2024 1323+3/-3
75gpt-4.1-nano 4/14 1321+8/-8
76amazon-nova-experimental-chat 5/14 1320+5/-5
78llama-3.3-70b-instruct open1320+3/-3
76llama-4-scout-17b-16e-instruct open1320+5/-5
80claude-3-5-haiku 10/22 2024 1318+3/-3
79glm-4-plus 1318+5/-5
78gemma-3n-e4b-it 1318+5/-5
83gpt-4o-mini 7/18 2024 1316+3/-3
83gpt-4-1106-preview 1315+4/-4
83gpt-4-0125-preview 1315+4/-4
83athene-v2-chat open1314+4/-4
83mistral-large Jul 2024 open1314+4/-4
85gemini-1.5-flash v2 1312+4/-4
83hunyuan-standard 2/10 1310+10/-10
97grok-2-mini 8/13 2024 1307+3/-3
97mistral-large Nov 2024 open1305+4/-4
96athene-70b 7/25 open1305+5/-5
93gemma-3-4b-it open1304+9/-9
99qwen2.5-72b-instruct open1303+4/-4
99magistral-medium Jun 1301+7/-7
99mistral-small-3.1-24b-instruct Mar open1301+5/-5
99llama-3.1-nemotron-70b-instruct open1298+7/-7
99hunyuan-large-vision 1296+9/-9
104llama-3.1-70b-instruct open1295+3/-3
106amazon-nova-pro-v1.0 1290+4/-4
104jamba-1.5-large open1289+7/-7
103llama-3.1-tulu-3-70b open1289+10/-10
104reka-core 9/4 2024 1289+7/-7
107gpt-4 3/14 1288+5/-5
109gemma-2-27b-it open1287+3/-3
104llama-3.1-nemotron-51b-instruct open1287+10/-10
110gemma-2-9b-it-simpo open1280+7/-7
111nemotron-4-340b-instruct open1280+5/-5
110command-r-plus Aug 2024 open1279+6/-6
115llama-3-70b-instruct open1277+3/-3
115gpt-4 6/13 1276+4/-4
113glm-4 5/20 1276+7/-7
114reka-flash 9/4 2024 1276+7/-7
115mistral-small-24b-instruct Jan open1276+6/-6
115qwen2.5-coder-32b-instruct open1272+8/-8
120c4ai-aya-expanse-32b open1269+5/-5
123command-r-plus open1266+4/-4
125gemma-2-9b-it open1265+4/-4
123qwen2-72b-instruct open1265+5/-5
125claude-3-haiku 3/7 2024 1263+4/-4
125amazon-nova-lite-v1.0 1262+5/-5
125gemini-1.5-flash-8b v1 1262+4/-4
127phi-4 open1259+4/-4
125olmo-2-0325-32b-instruct 1256+11/-11
128command-r Aug 2024 open1255+6/-6
134mistral-large Feb 2024 1245+5/-5
134amazon-nova-micro-v1.0 1245+5/-5
135jamba-1.5-mini open1241+7/-7
134ministral-8b Oct 2024 open1241+9/-9
135hunyuan-standard-256k 1237+11/-11
136reka-flash-21b-online 2/26 2024 1236+7/-7
138mixtral-8x22b-instruct v0.1 open1233+4/-4
138command-r open1232+5/-5
138reka-flash-21b 2/26 2024 1230+6/-6
138c4ai-aya-expanse-8b open1228+7/-7
139mistral-medium 1227+5/-5
140gpt-3.5-turbo 1/25 1226+5/-5
140llama-3-8b-instruct open1226+3/-3
138llama-3.1-tulu-3-8b open1225+10/-10
145yi-1.5-34b-chat open1218+5/-5
142zephyr-orpo-141b-A35b v0.1 open1217+10/-10
149llama-3.1-8b-instruct open1215+4/-4
144granite-3.1-8b-instruct open1214+10/-10
151gpt-3.5-turbo 11/6 1204+9/-9
152phi-3-medium-4k-instruct open1202+5/-5
153mixtral-8x7b-instruct v0.1 open1202+4/-4
152internlm2_5-20b-chat open1199+7/-7
153dbrx-instruct-preview open1199+6/-6
154wizardlm-70b open1189+9/-9
156granite-3.0-8b-instruct open1189+8/-8
157yi-34b-chat open1188+7/-7
157openchat-3.5 1/6 open1187+8/-8
157openchat-3.5 open1185+10/-10
156granite-3.1-2b-instruct open1185+11/-11
159snowflake-arctic-instruct open1184+6/-6
159openhermes-2.5-mistral-7b open1179+10/-10
159vicuna-33b open1178+6/-6
159starling-lm-7b-beta open1177+7/-7
159phi-3-small-8k-instruct open1177+6/-6
160starling-lm-7b-alpha open1173+8/-8
161llama-3.2-3b-instruct open1173+7/-7
159nous-hermes-2-mixtral-8x7b-dpo open1171+12/-12
167granite-3.0-2b-instruct open1163+8/-8
166solar-10.7b-instruct v1 1159+13/-13
166dolphin-2.2.1-mistral-7b 1158+15/-15
172mistral-7b-instruct v0.2 open1156+6/-6
170mpt-30b-chat open1155+12/-12
172wizardlm-13b open1155+9/-9
170falcon-180b-chat open1150+17/-17
174phi-3-mini-4k-instruct-june-2024 open1149+6/-6
174vicuna-13b open1146+7/-7
174codellama-34b-instruct open1142+9/-9
175palm-2 1139+9/-9
177phi-3-mini-128k-instruct open1137+7/-7
177zephyr-7b-beta open1137+9/-9
180phi-3-mini-4k-instruct open1135+6/-6
175zephyr-7b-alpha open1133+16/-16
177guanaco-33b open1132+12/-12
177smollm2-1.7b-instruct open1130+13/-13
178codellama-70b-instruct open1125+18/-18
181stripedhyena-nous-7b open1125+11/-11
185llama-3.2-1b-instruct open1122+7/-7
186vicuna-7b open1119+9/-9
187mistral-7b-instruct open1115+9/-9
195olmo-7b-instruct open1080+11/-11
195koala-13b open1075+10/-10
195gpt4all-13b-snoozy open1067+15/-15
195alpaca-13b 1067+11/-11
195mpt-7b-chat open1065+12/-12
195chatglm3-6b open1060+12/-12
197RWKV-4-Raven-14B open1045+11/-11
201chatglm2-6b open1031+13/-13
201oasst-pythia-12b open1025+11/-11
204chatglm-6b open1001+13/-13
204fastchat-t5-3b open995+12/-12
204dolly-v2-12b open980+14/-14
206stablelm-tuned-alpha-7b open956+13/-13

Settings

Visualize scores
Price ranges
Drop models

Remember: You need a 70 point difference for a 60% win rate