On SWE-Bench Verified, the model achieved a score of 70.6%. This performance is notably competitive when placed alongside significantly larger models; it outpaces DeepSeek-V3.2, which scores 70.2%, ...
Key cyber updates on ransomware, cloud intrusions, phishing, botnets, supply-chain risks, and nation-state threat activity.
See something others should know about? Email CHS or call/txt (206) 399-5959. You can view recent CHS 911 coverage here. Hear sirens and wondering what’s going on? Check out reports ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results