Software Engineering Agents (SWE agents) can autonomously perform development tasks on benchmarks like SWE Bench, but still face challenges when tackling complex and ambiguous real-world tasks.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results