Language Model Board, a better way to read the LM Arena results

RankModelRating95% CI
1o3-pro est 1473+5/-3
2gemini-2.5-pro 1462+4/-5
3o3 4/16 1452+3/-4
4chatgpt-4o-latest 3/26 1443+3/-3
4gpt-4.5-preview 2/27 1437+6/-5
4grok-4 7/9 1437+6/-7
6kimi-k2-0711-preview new open1420+11/-13
7claude-opus-4 5/14 16k 1420+4/-5
7claude-opus-4 5/14 1416+4/-4
7gemini-2.5-flash No est 1415+0/-8
7deepseek-r1 5/28 open1415+5/-5
7gemini-2.5-flash 1414+3/-4
7gpt-4.1 4/14 1412+3/-4
9grok-3-preview 2/24 1409+3/-4
12claude-sonnet-4 5/14 32k 1402+5/-6
15qwen3-235b-a22b No open1400+4/-5
15o4-mini 4/16 1400+4/-5
15o1 12/17 2024 1399+3/-3
15chatgpt-4o-latest 11/20 2024 1398+3/-3
15deepseek-v3 3/24 open1397+5/-4
15deepseek-r1 open1395+4/-5
15claude-sonnet-4 5/14 1392+5/-5
22claude-3-7-sonnet 2/19 32k 1385+3/-4
22mistral-medium May 1385+4/-4
22minimax-m1 open1383+6/-6
23o1-preview 1382+4/-3
23hunyuan-turbos 4/16 1382+6/-7
26gpt-4.1-mini 4/14 1375+4/-5
26claude-3-7-sonnet 2/19 1375+3/-4
27qwen3-235b-a22b open1372+5/-5
30claude-3-5-sonnet 10/22 2024 1366+2/-2
29o3-mini-high 1366+4/-5
30gemma-3-27b-it open1365+3/-4
31gemini-2.0-flash v1 1363+3/-3
32grok-3-mini-high 1357+6/-8
34deepseek-v3 open1357+3/-4
35grok-3-mini-beta 1353+4/-5
35gemini-2.0-flash-lite-preview 2/5 1350+3/-3
35mistral-small Jun open1349+8/-8
36gemini-1.5-pro v2 1348+2/-3
35qwen3-32b open1348+8/-9
36command-a Mar open1347+3/-3
37o3-mini 1346+3/-3
35llama-3.1-nemotron-ultra-253b v1 1346+10/-10
35hunyuan-turbos 2/26 1344+10/-10
40gpt-4o 5/13 2024 1343+2/-2
36gemma-3-12b-it open1343+8/-10
42claude-3-5-sonnet 6/20 2024 1340+2/-2
37hunyuan-turbo 1/10 1338+11/-9
42glm-4-plus 1/11 1336+5/-7
44qwq-32b open1336+3/-4
46o1-mini 1335+2/-2
46llama-3.1-405b-instruct-bf16 open1334+3/-3
47llama-3.1-405b-instruct-fp8 open1333+2/-2
47llama-4-maverick-17b-128e-instruct open1331+4/-4
44step-2-16k-exp Dec 2024 1331+8/-7
49gpt-4o 8/6 2024 1330+3/-2
51grok-2 8/13 2024 1329+3/-2
50qwen3-30b-a3b open1328+4/-4
54yi-lightning 1326+4/-3
46llama-3.3-nemotron-49b-super v1 open1326+11/-11
55hunyuan-large 2/10 1322+7/-9
59gpt-4-turbo 4/9 2024 1322+2/-2
56llama-4-scout-17b-16e-instruct open1321+6/-8
58gpt-4.1-nano 4/14 1321+5/-6
61claude-3-opus 2/29 2024 1319+2/-1
61yi-lightning-lite 1318+4/-4
60amazon-nova-experimental-chat 5/14 1318+5/-6
62claude-3-5-haiku 10/22 2024 1315+2/-3
63llama-3.3-70b-instruct open1315+2/-2
62glm-4-plus 1315+3/-3
62gemma-3n-e4b-it 1314+5/-6
63gpt-4-1106-preview 1314+2/-1
63gpt-4o-mini 7/18 2024 1314+2/-2
66mistral-large Jul 2024 open1311+3/-2
67gpt-4-0125-preview 1311+2/-2
66athene-v2-chat open1311+4/-4
62hunyuan-standard 2/10 1310+9/-10
71gemini-1.5-flash v2 1309+3/-2
77grok-2-mini 8/13 2024 1304+3/-2
71magistral-medium Jun 1303+8/-7
78athene-70b 7/25 open1302+4/-5
73gemma-3-4b-it open1302+10/-9
79qwen2.5-72b-instruct open1302+2/-3
79mistral-large Nov 2024 open1301+3/-3
79llama-3.1-nemotron-70b-instruct open1296+7/-6
82mistral-small-3.1-24b-instruct Mar open1294+5/-6
85llama-3.1-70b-instruct open1293+2/-3
82llama-3.1-tulu-3-70b open1289+9/-10
85hunyuan-large-vision 1289+7/-6
86amazon-nova-pro-v1.0 1288+3/-4
88gemma-2-27b-it open1287+2/-2
85reka-core 9/4 2024 1287+6/-6
88yi-large-preview 1286+3/-3
86jamba-1.5-large open1285+8/-6
86llama-3.1-nemotron-51b-instruct open1284+8/-7
92gpt-4 3/14 1280+3/-3
91gemma-2-9b-it-simpo open1279+4/-5
94claude-3-sonnet 2/29 2024 1277+2/-2
93command-r-plus Aug 2024 open1277+5/-5
95nemotron-4-340b-instruct open1276+4/-5
96llama-3-70b-instruct open1275+2/-2
96yi-large 1274+4/-5
96reka-flash 9/4 2024 1272+5/-4
98mistral-small-24b-instruct Jan open1271+5/-4
97qwen2.5-coder-32b-instruct open1270+7/-7
100glm-4 5/20 1268+5/-5
104gpt-4 6/13 1266+3/-2
105c4ai-aya-expanse-32b open1265+3/-3
106command-r-plus open1263+3/-2
106amazon-nova-lite-v1.0 1261+3/-3
106qwen2-72b-instruct open1260+3/-3
109gemma-2-9b-it open1260+2/-2
109gemini-1.5-flash-8b v1 1259+3/-3
111claude-3-haiku 3/7 2024 1258+2/-2
111phi-4 open1257+3/-3
109command-r Aug 2024 open1256+6/-5
106olmo-2-0325-32b-instruct 1256+11/-7
119amazon-nova-micro-v1.0 1243+5/-4
118glm-4 1/16 1243+7/-5
119jamba-1.5-mini open1240+5/-7
119ministral-8b Oct 2024 open1239+6/-8
119claude-1 1239+4/-4
119mistral-large Feb 2024 1238+2/-2
119hunyuan-standard-256k 1232+11/-9
121reka-flash-21b-online 2/26 2024 1231+5/-4
121c4ai-aya-expanse-8b open1230+5/-4
123mistral-next 1229+5/-5
124mixtral-8x22b-instruct v0.1 open1229+3/-2
125command-r open1227+3/-2
125claude-2.0 1227+5/-5
120llama-3.1-tulu-3-8b open1226+11/-9
125gpt-3.5-turbo 3/14 1224+6/-8
129llama-3-8b-instruct open1222+2/-2
128reka-flash-21b 2/26 2024 1222+4/-4
130mistral-medium 1221+3/-3
133gpt-3.5-turbo 1/25 1217+3/-2
133gpt-3.5-turbo 6/13 1217+3/-3
133claude-2.1 1215+4/-3
129granite-3.1-8b-instruct open1215+10/-8
136yi-1.5-34b-chat open1213+3/-4
137llama-3.1-8b-instruct open1212+3/-2
133zephyr-orpo-141b-A35b v0.1 open1211+8/-9
140claude-instant-1 1208+4/-4
145phi-3-medium-4k-instruct open1198+3/-3
145internlm2_5-20b-chat open1195+6/-5
145mixtral-8x7b-instruct v0.1 open1194+3/-2
145dbrx-instruct-preview open1193+3/-4
146gpt-3.5-turbo 11/6 1190+5/-4
145granite-3.0-8b-instruct open1189+6/-9
148wizardlm-70b open1185+6/-5
146granite-3.1-2b-instruct open1183+10/-8
149snowflake-arctic-instruct open1182+4/-4
150yi-34b-chat open1181+4/-4
149openchat-3.5 1/6 open1180+6/-5
150phi-3-small-8k-instruct open1178+5/-4
150openchat-3.5 open1175+6/-5
152llama-3.2-3b-instruct open1174+6/-6
154vicuna-33b open1172+4/-4
154starling-lm-7b-beta open1172+4/-4
153openhermes-2.5-mistral-7b open1170+7/-8
158starling-lm-7b-alpha open1163+6/-5
158granite-3.0-2b-instruct open1161+8/-7
161pplx-70b-online 1156+8/-8
162nous-hermes-2-mixtral-8x7b-dpo open1151+8/-7
162dolphin-2.2.1-mistral-7b 1150+11/-14
164phi-3-mini-4k-instruct-june-2024 open1148+5/-6
164mistral-7b-instruct v0.2 open1147+4/-5
163solar-10.7b-instruct v1 1146+9/-6
164wizardlm-13b open1144+8/-7
164mpt-30b-chat open1141+10/-11
163falcon-180b-chat open1139+18/-14
168vicuna-13b open1139+4/-6
168phi-3-mini-4k-instruct open1136+3/-4
166smollm2-1.7b-instruct open1132+11/-11
171zephyr-7b-beta open1129+6/-5
172phi-3-mini-128k-instruct open1129+4/-5
171codellama-34b-instruct open1128+7/-7
169zephyr-7b-alpha open1127+10/-13
173llama-3.2-1b-instruct open1125+7/-6
173palm-2 1123+8/-7
174pplx-7b-online 1122+6/-7
173guanaco-33b open1121+10/-11
171codellama-70b-instruct open1119+17/-20
177stripedhyena-nous-7b open1114+9/-6
183mistral-7b-instruct open1107+6/-6
185vicuna-7b open1102+6/-6
188olmo-7b-instruct open1073+7/-5
189koala-13b open1061+6/-7
189chatglm3-6b open1054+10/-8
189gpt4all-13b-snoozy open1053+13/-12
189alpaca-13b 1048+8/-9
189mpt-7b-chat open1048+8/-10
192RWKV-4-Raven-14B open1030+11/-9
194chatglm2-6b open1022+13/-11
195oasst-pythia-12b open1012+7/-7
197fastchat-t5-3b open977+11/-11
197chatglm-6b open969+8/-9
197dolly-v2-12b open963+9/-10
200stablelm-tuned-alpha-7b open940+10/-11

Settings

Visualize scores
Price ranges
Drop models

Remember: You need a 70 point difference for a 60% win rate