Behavior Model Training

Model Training Behavior

Q: How can a high school runner become faster toward the end of the season when his only hard training has been done during races? A: Yakovlev's Model. So what the heck is Yakovlev's Model? It's a ...

Time

Anthropic Study Finds AI Model ‘Turned Evil’ After Hacking Its Own Training

A person holds a smartphone displaying Claude. AI models can do scary things. There are signs that they could deceive and blackmail users. Still, a common critique is that these misbehaviors are ...

Mashable

Anthropic AI research model hacks its training, breaks bad

A new paper from Anthropic, released on Friday, suggests that AI can be "quite evil" when it's trained to cheat. Anthropic found that when an AI model learns to cheat on software programming tasks and ...

ZDNet

Anthropic wants to stop AI models from turning evil - here's how

New research from Anthropic identifies model characteristics, called persona vectors. This helps catch bad behavior without impacting performance. Still, developers don't know enough about why models ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results