Hacker News Logo

Offline

dayweek

SWE-bench Verified no longer measures frontier coding capabilities

292 points|openai.com|
kmdupree|22hrs