AI coding benchmark MirrorCode published its full results June 26, showing Claude Opus 4.7 autonomously rebuilt a 60,000-line interpreter and scored 56% overall — completing tasks that take human ...
An agentic coding tool tasked with cloning and setting up a seemingly benign GitHub repository could execute a malicious ...
DeepReinforce today released Ornith-1.0, a family of open-source coding models built around a mechanism most RL-trained agents avoid: the model itself writes the training harness that guides its own ...
OpenAI is now turning its Daybreak initiative into a defensive cybersecurity program that combines Codex updates, the GPT-5.5-Cyber release and partner access for approved organizations. As OpenAI ...
A North Korea-linked macOS backdoor has been caught hiding a prompt injection that targets malware analyst's AI tools, rather ...
OpenAI expanded its Daybreak cybersecurity initiative with a new suite of tools and partnerships focusing on getting patches ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results