Offline
day
week
SWE-bench Verified no longer measures frontier coding capabilities
292 points
|
openai.com
|
kmdupree
|
22hrs