Language Model Board, a better way to read the LM Arena results

RankModelRating95% CI
1gemini-2.5-pro new 1467+6/-5
2o3 4/16 1451+5/-4
3chatgpt-4o-latest 3/26 1442+5/-4
3gpt-4.5-preview 2/27 1437+5/-4
5claude-opus-4 5/14 1418+4/-3
5gemini-2.5-flash new 1418+5/-4
5deepseek-r1 5/28 open1413+5/-5
5gpt-4.1 4/14 1411+5/-4
6grok-3-preview 2/24 1409+4/-3
10o1 12/17 2024 1399+3/-4
10chatgpt-4o-latest 11/20 2024 1398+3/-3
10o4-mini 4/16 1398+5/-4
10deepseek-v3 3/24 open1397+5/-4
10qwen3-235b-a22b No open1396+5/-5
10deepseek-r1 open1395+4/-3
10claude-sonnet-4 5/14 1393+5/-5
10minimax-m1 new open1391+7/-9
15claude-3-7-sonnet 2/19 32k 1387+4/-4
17o1-preview 1383+3/-3
17mistral-medium May 1381+6/-4
17hunyuan-turbos 4/16 1379+6/-6
20claude-3-7-sonnet 2/19 1376+4/-3
20gpt-4.1-mini 4/14 1374+4/-3
21qwen3-235b-a22b open1369+5/-4
24claude-3-5-sonnet 10/22 2024 1367+2/-2
24o3-mini-high 1366+4/-4
24gemini-2.0-flash v1 1364+3/-2
25gemma-3-27b-it open1362+3/-4
28deepseek-v3 open1357+4/-4
26grok-3-mini-beta 1357+6/-7
29gemini-2.0-flash-lite-preview 2/5 1350+3/-3
30gemini-1.5-pro v2 1348+2/-2
30o3-mini 1348+3/-3
31command-a Mar open1345+4/-4
29hunyuan-turbos 2/26 1345+11/-10
33gpt-4o 5/13 2024 1344+2/-2
31gemma-3-12b-it open1341+8/-10
34claude-3-5-sonnet 6/20 2024 1341+2/-2
31hunyuan-turbo 1/10 1339+11/-11
35qwq-32b open1337+5/-4
34glm-4-plus 1/11 1336+8/-6
37o1-mini 1335+2/-2
37llama-3.1-405b-instruct-bf16 open1334+2/-3
37llama-3.1-405b-instruct-fp8 open1333+2/-2
37step-2-16k-exp Dec 2024 1331+6/-6
39gpt-4o 8/6 2024 1331+2/-3
38llama-4-maverick-17b-128e-instruct open1329+4/-4
40grok-2 8/13 2024 1329+3/-2
43yi-lightning 1327+3/-3
37llama-3.3-nemotron-49b-super v1 open1327+11/-11
49gpt-4-turbo 4/9 2024 1323+1/-2
43hunyuan-large 2/10 1322+8/-6
44gpt-4.1-nano 4/14 1320+8/-6
47amazon-nova-experimental-chat 5/14 1320+6/-6
50claude-3-opus 2/29 2024 1319+2/-2
50yi-lightning-lite 1319+3/-5
51glm-4-plus 1315+3/-3
50llama-4-scout-17b-16e-instruct open1315+8/-8
52claude-3-5-haiku 10/22 2024 1315+3/-3
52gpt-4-1106-preview 1315+2/-2
52llama-3.3-70b-instruct open1315+3/-2
52gpt-4o-mini 7/18 2024 1314+2/-2
56mistral-large Jul 2024 open1312+2/-3
51gemma-3n-e4b-it 1311+8/-8
56gpt-4-0125-preview 1311+2/-2
56athene-v2-chat open1311+3/-2
51hunyuan-standard 2/10 1310+9/-9
62gemini-1.5-flash v2 1309+2/-3
51magistral-medium Jun new 1308+11/-10
65grok-2-mini 8/13 2024 1305+2/-2
67athene-70b 7/25 open1303+4/-3
67qwen2.5-72b-instruct open1302+3/-2
67mistral-large Nov 2024 open1301+3/-3
64gemma-3-4b-it open1301+8/-8
68llama-3.1-nemotron-70b-instruct open1296+6/-8
67mistral-small-3.1-24b-instruct Mar open1296+8/-6
74llama-3.1-70b-instruct open1294+2/-2
72llama-3.1-tulu-3-70b open1289+10/-9
76amazon-nova-pro-v1.0 1288+3/-3
74reka-core 9/4 2024 1288+6/-5
77gemma-2-27b-it open1288+2/-2
77yi-large-preview 1287+2/-2
76jamba-1.5-large open1286+6/-4
74llama-3.1-nemotron-51b-instruct open1285+9/-9
82gpt-4 3/14 1280+2/-3
80gemma-2-9b-it-simpo open1280+5/-5
83claude-3-sonnet 2/29 2024 1278+2/-2
81command-r-plus Aug 2024 open1278+5/-5
84nemotron-4-340b-instruct open1277+3/-4
85llama-3-70b-instruct open1276+2/-2
84yi-large 1275+3/-4
84reka-flash 9/4 2024 1273+6/-6
88mistral-small-24b-instruct Jan open1271+3/-4
85qwen2.5-coder-32b-instruct open1270+7/-7
88glm-4 5/20 1269+5/-6
92gpt-4 6/13 1266+2/-2
92c4ai-aya-expanse-32b open1265+3/-3
94command-r-plus open1263+2/-2
94amazon-nova-lite-v1.0 1262+3/-4
95qwen2-72b-instruct open1261+3/-3
96gemma-2-9b-it open1261+2/-3
98gemini-1.5-flash-8b v1 1260+2/-3
92hunyuan-large-vision new 1259+11/-10
99claude-3-haiku 3/7 2024 1258+2/-2
99phi-4 open1257+3/-3
99command-r Aug 2024 open1256+4/-5
95olmo-2-0325-32b-instruct 1255+8/-9
107amazon-nova-micro-v1.0 1244+4/-4
106glm-4 1/16 1244+6/-5
108jamba-1.5-mini open1240+5/-6
107ministral-8b Oct 2024 open1240+8/-7
108claude-1 1240+4/-4
108mistral-large Feb 2024 1238+3/-3
108hunyuan-standard-256k 1233+7/-10
110reka-flash-21b-online 2/26 2024 1232+4/-5
110c4ai-aya-expanse-8b open1231+5/-6
114mixtral-8x22b-instruct v0.1 open1230+3/-3
112mistral-next 1230+5/-5
114command-r open1228+2/-2
114claude-2.0 1228+6/-4
110llama-3.1-tulu-3-8b open1227+9/-9
117llama-3-8b-instruct open1223+2/-3
114gpt-3.5-turbo 3/14 1223+8/-9
117reka-flash-21b 2/26 2024 1222+3/-4
118mistral-medium 1222+3/-4
122gpt-3.5-turbo 1/25 1218+2/-2
122gpt-3.5-turbo 6/13 1218+3/-3
122claude-2.1 1216+3/-3
118granite-3.1-8b-instruct open1216+10/-8
125yi-1.5-34b-chat open1214+4/-4
126llama-3.1-8b-instruct open1213+2/-3
122zephyr-orpo-141b-A35b v0.1 open1211+8/-8
129claude-instant-1 1208+5/-4
134phi-3-medium-4k-instruct open1199+4/-3
134internlm2_5-20b-chat open1196+5/-5
134mixtral-8x7b-instruct v0.1 open1195+2/-2
134dbrx-instruct-preview open1194+3/-4
135gpt-3.5-turbo 11/6 1191+4/-5
134granite-3.0-8b-instruct open1190+6/-8
136wizardlm-70b open1186+6/-7
135granite-3.1-2b-instruct open1184+10/-12
138snowflake-arctic-instruct open1183+3/-3
138yi-34b-chat open1182+6/-4
138openchat-3.5 1/6 open1181+5/-5
139phi-3-small-8k-instruct open1179+4/-5
139openchat-3.5 open1175+7/-5
141llama-3.2-3b-instruct open1174+6/-6
143starling-lm-7b-beta open1172+4/-5
144vicuna-33b open1172+3/-4
143openhermes-2.5-mistral-7b open1171+7/-7
147starling-lm-7b-alpha open1164+6/-6
148granite-3.0-2b-instruct open1162+6/-7
151pplx-70b-online 1157+6/-7
151nous-hermes-2-mixtral-8x7b-dpo open1152+7/-9
151dolphin-2.2.1-mistral-7b 1150+10/-13
153phi-3-mini-4k-instruct-june-2024 open1149+5/-5
153mistral-7b-instruct v0.2 open1148+4/-4
153solar-10.7b-instruct v1 1146+7/-8
153wizardlm-13b open1144+7/-8
153mpt-30b-chat open1141+10/-11
153falcon-180b-chat open1140+13/-17
157vicuna-13b open1139+4/-4
157phi-3-mini-4k-instruct open1137+3/-4
154smollm2-1.7b-instruct open1132+12/-11
161zephyr-7b-beta open1130+5/-6
161phi-3-mini-128k-instruct open1129+4/-4
160codellama-34b-instruct open1128+8/-6
157zephyr-7b-alpha open1127+12/-11
162llama-3.2-1b-instruct open1125+5/-6
163palm-2 1123+4/-7
163pplx-7b-online 1122+6/-7
163guanaco-33b open1121+9/-11
161codellama-70b-instruct open1119+16/-13
167stripedhyena-nous-7b open1115+7/-7
172mistral-7b-instruct open1107+5/-6
173vicuna-7b open1102+6/-7
177olmo-7b-instruct open1074+8/-8
177koala-13b open1061+7/-7
178chatglm3-6b open1055+10/-10
178gpt4all-13b-snoozy open1053+13/-14
178mpt-7b-chat open1048+8/-10
178alpaca-13b 1048+8/-8
183RWKV-4-Raven-14B open1030+8/-7
183chatglm2-6b open1024+11/-11
184oasst-pythia-12b open1012+7/-8
186fastchat-t5-3b open977+9/-8
186chatglm-6b open969+7/-9
186dolly-v2-12b open963+10/-11
189stablelm-tuned-alpha-7b open940+10/-11

Settings

Visualize scores
Price ranges
Drop models

Remember: You need a 70 point difference for a 60% win rate