Language Model Board, a better way to read the LM Arena results

RankModelRating95% CI
1gemini-2.5-pro-preview 6/5 new 1470+8/-9
2gemini-2.5-pro-preview 5/6 1446+6/-7
2o3 4/16 1443+4/-5
4chatgpt-4o-latest 3/26 1431+4/-4
4gpt-4.5-preview 2/27 1425+5/-4
5gemini-2.5-flash-preview 5/20 1419+6/-7
6claude-opus-4 5/14 1414+6/-6
8gpt-4.1 4/14 1402+4/-5
8grok-3-preview 2/24 1399+3/-6
9o4-mini 4/16 1390+5/-5
9claude-sonnet-4 5/14 1390+5/-6
9deepseek-v3 3/24 open1388+5/-4
10o1 12/17 2024 1388+3/-4
10chatgpt-4o-latest 11/20 2024 1387+2/-2
10deepseek-r1 open1384+4/-4
16o1-preview 1372+3/-3
16mistral-medium May 1369+6/-5
16claude-3-7-sonnet 2/19 1365+4/-3
16gpt-4.1-mini 4/14 1365+4/-6
16hunyuan-turbos 4/16 1363+8/-8
17claude-3-7-sonnet-thinking-32k 2/19 1363+4/-4
21gemini-2.0-flash v1 1355+3/-3
21claude-3-5-sonnet 10/22 2024 1355+2/-2
21o3-mini-high 1355+4/-4
21gemma-3-27b-it open1353+5/-3
26deepseek-v3 open1346+3/-3
26qwen3-235b-a22b open1341+5/-7
27gemini-2.0-flash-lite-preview 2/5 1339+4/-3
27o3-mini 1339+3/-4
27gemini-1.5-pro v2 1337+3/-2
27command-a Mar open1336+3/-4
26hunyuan-turbos 2/26 1334+10/-11
30gpt-4o 5/13 2024 1332+2/-2
27gemma-3-12b-it open1330+10/-9
33claude-3-5-sonnet 6/20 2024 1329+2/-2
27hunyuan-turbo 1/10 1328+8/-12
32glm-4-plus 1/11 1326+6/-7
33o1-mini 1325+2/-2
34llama-3.1-405b-instruct-bf16 open1323+2/-3
34llama-3.1-405b-instruct-fp8 open1322+2/-2
33step-2-16k-exp Dec 2024 1320+8/-8
36gpt-4o 8/6 2024 1319+2/-3
34llama-4-maverick-17b-128e-instruct open1319+4/-4
37grok-2 8/13 2024 1318+2/-2
39yi-lightning 1316+3/-3
34llama-3.3-nemotron-49b-super v1 open1315+11/-12
44gpt-4-turbo 4/9 2024 1311+2/-2
37hunyuan-large 2/10 1311+10/-10
40grok-3-mini-beta 1310+8/-6
44qwq-32b open1310+4/-5
43gpt-4.1-nano 4/14 1308+7/-7
46yi-lightning-lite 1308+3/-5
46claude-3-opus 2/29 2024 1308+2/-2
47glm-4-plus 1304+3/-4
48gpt-4-1106-preview 1303+2/-2
47llama-3.3-70b-instruct open1303+3/-2
48gpt-4o-mini 7/18 2024 1303+2/-2
48claude-3-5-haiku 10/22 2024 1303+3/-3
51mistral-large Jul 2024 open1301+2/-2
52athene-v2-chat open1300+3/-4
53gpt-4-0125-preview 1300+2/-2
47hunyuan-standard 2/10 1299+9/-11
57gemini-1.5-flash v2 1298+2/-3
48gemma-3n-e4b-it 1298+8/-9
63grok-2-mini 8/13 2024 1293+2/-2
62athene-70b 7/25 open1292+4/-4
63qwen2.5-72b-instruct open1291+3/-3
63mistral-large Nov 2024 open1290+3/-3
63gemma-3-4b-it open1289+6/-8
64llama-3.1-nemotron-70b-instruct open1285+5/-6
62mistral-small-3.1-24b-instruct Mar open1285+11/-10
69llama-3.1-70b-instruct open1283+2/-2
68llama-3.1-tulu-3-70b open1278+9/-9
71amazon-nova-pro-v1.0 1277+4/-4
69reka-core 9/4 2024 1276+6/-6
72gemma-2-27b-it open1276+2/-2
72yi-large-preview 1276+2/-2
71jamba-1.5-large open1274+5/-6
71llama-3.1-nemotron-51b-instruct open1273+7/-10
73gemma-2-9b-it-simpo open1269+6/-5
76gpt-4 3/14 1268+2/-3
78claude-3-sonnet 2/29 2024 1266+2/-2
76command-r-plus Aug 2024 open1266+6/-4
79nemotron-4-340b-instruct open1265+3/-4
79llama-3-70b-instruct open1264+2/-2
79yi-large 1264+4/-4
79reka-flash 9/4 2024 1262+5/-5
82mistral-small-24b-instruct Jan open1260+4/-5
80qwen2.5-coder-32b-instruct open1259+6/-7
82glm-4 5/20 1257+6/-4
88gpt-4 6/13 1254+2/-2
88c4ai-aya-expanse-32b open1254+3/-4
89command-r-plus open1252+2/-2
89amazon-nova-lite-v1.0 1250+3/-4
90qwen2-72b-instruct open1249+3/-3
91gemma-2-9b-it open1249+3/-2
92gemini-1.5-flash-8b v1 1248+3/-2
94claude-3-haiku 3/7 2024 1247+2/-2
94phi-4 open1246+3/-4
93command-r Aug 2024 open1245+5/-4
90olmo-2-0325-32b-instruct 1243+10/-8
101amazon-nova-micro-v1.0 1232+4/-3
101glm-4 1/16 1232+6/-6
101ministral-8b Oct 2024 open1229+9/-7
102jamba-1.5-mini open1229+5/-6
102claude-1 1228+4/-4
103mistral-large Feb 2024 1227+2/-2
102hunyuan-standard-256k 1222+11/-10
104reka-flash-21b-online 2/26 2024 1220+5/-4
104c4ai-aya-expanse-8b open1220+5/-4
108mixtral-8x22b-instruct v0.1 open1218+2/-3
106mistral-next 1218+6/-4
108command-r open1217+3/-2
104llama-3.1-tulu-3-8b open1216+10/-9
108claude-2.0 1215+5/-6
113llama-3-8b-instruct open1211+2/-2
113reka-flash-21b 2/26 2024 1211+3/-3
108gpt-3.5-turbo 3/14 1210+11/-8
112mistral-medium 1210+4/-3
116gpt-3.5-turbo 1/25 1206+3/-2
116gpt-3.5-turbo 6/13 1205+3/-2
117claude-2.1 1204+3/-4
113granite-3.1-8b-instruct open1204+10/-8
119yi-1.5-34b-chat open1202+3/-4
119llama-3.1-8b-instruct open1202+3/-2
116zephyr-orpo-141b-A35b v0.1 open1199+9/-9
123claude-instant-1 1196+4/-3
128phi-3-medium-4k-instruct open1187+3/-3
128gemma-2-2b-it open1187+3/-2
128internlm2_5-20b-chat open1185+5/-5
128mixtral-8x7b-instruct v0.1 open1183+2/-2
128dbrx-instruct-preview open1182+3/-4
128granite-3.0-8b-instruct open1178+8/-7
130gpt-3.5-turbo 11/6 1178+4/-4
131wizardlm-70b open1174+6/-6
130granite-3.1-2b-instruct open1172+9/-10
133snowflake-arctic-instruct open1171+3/-3
133yi-34b-chat open1170+5/-4
133openchat-3.5 1/6 open1169+5/-4
134phi-3-small-8k-instruct open1167+4/-4
135openchat-3.5 open1163+7/-5
135llama-3.2-3b-instruct open1162+7/-7
137starling-lm-7b-beta open1161+5/-4
139vicuna-33b open1160+4/-4
138openhermes-2.5-mistral-7b open1158+7/-8
143starling-lm-7b-alpha open1152+5/-6
141granite-3.0-2b-instruct open1150+9/-7
145pplx-70b-online 1145+7/-7
146nous-hermes-2-mixtral-8x7b-dpo open1140+8/-8
145dolphin-2.2.1-mistral-7b 1138+15/-12
148phi-3-mini-4k-instruct-june-2024 open1137+4/-6
148mistral-7b-instruct v0.2 open1136+4/-3
148solar-10.7b-instruct v1 1134+8/-8
148wizardlm-13b open1132+6/-8
148mpt-30b-chat open1129+9/-12
148falcon-180b-chat open1127+16/-14
151vicuna-13b open1127+5/-4
152phi-3-mini-4k-instruct open1125+4/-4
151smollm2-1.7b-instruct open1121+11/-13
155zephyr-7b-beta open1118+5/-5
156phi-3-mini-128k-instruct open1117+4/-3
156codellama-34b-instruct open1116+7/-8
154zephyr-7b-alpha open1115+11/-15
157llama-3.2-1b-instruct open1114+5/-8
157palm-2 1111+8/-7
157pplx-7b-online 1110+7/-7
157guanaco-33b open1109+11/-12
154codellama-70b-instruct open1107+18/-17
161stripedhyena-nous-7b open1102+8/-7
166mistral-7b-instruct open1095+6/-7
168vicuna-7b open1090+6/-6
172olmo-7b-instruct open1063+8/-7
172koala-13b open1048+8/-7
173chatglm3-6b open1043+9/-7
173gpt4all-13b-snoozy open1040+14/-12
173mpt-7b-chat open1036+9/-10
173alpaca-13b 1035+11/-8
177RWKV-4-Raven-14B open1017+8/-9
178chatglm2-6b open1011+11/-9
179oasst-pythia-12b open999+8/-7
181fastchat-t5-3b open965+10/-11
181chatglm-6b open957+9/-9
181dolly-v2-12b open950+9/-10
184stablelm-tuned-alpha-7b open927+9/-11

Settings

Visualize scores
Price ranges
Drop models

Remember: You need a 70 point difference for a 60% win rate