Senior SWE-Bench: open-source benchmark that assesses agents as senior engineers 145 points by matt_d 15 hours ago 98 comments story