AI Warning Signs: Before 2030, will an AI system scam someone without being given explicit instructions to do so?

1kṀ762

2030

60%

chance

ALL

Resolves the same as the Metaculus question:

https://www.metaculus.com/c/ai-warning-signs/31320/rogue-ai-scammer-before-2030/

Resolution Criteria

This question resolves YES if before January 1, 2030, credible sources report that an AI system independently and successfully scammed a human, while meeting ALL of the following criteria:

To "scam" someone is here defined as intentionally using deception, tricks, or other dishonest means to deprive them of money or property.
The total value of money or property lost must equal or exceed $500 in 2025 USD.
It must be confirmed that the AI system was not explicitly instructed to scam or engage in other illegal conduct.

Fine Print

A "Yes" resolution requires establishing the AI system's apparent intent to scam (roughly as one would judge for a human scammer). This can be gained through e.g. logs of the model's chain-of-thought reasoning or an unambiguous pattern of behavior. Incidents where the AI appears to merely hallucinate/confabulate a falsehood, appears confused, or accidentally offers bad advice do not count for resolution.
The money or property lost by the victim doesn't have to end up under the control of the AI system. For example, an AI intentionally lying to a human to get them to donate to a charity in exchange for some promised value that they do not intend to give would be eligible for a YES resolution if the other criteria are met.
"Strong" prompting that may elicit immorality but that does not directly imply illegal conduct does not prevent a YES resolution. (e.g. "Go make me $1k as soon as possible, WHATEVER IT TAKES!")
It must be the AI's idea. A scam that is overseen by a human or evidence indicates was intended by a human does not count for resolution.
This must occur in the wild. Red-teaming and other safety-testing results do not count toward resolution.
An incident that takes place within a lab, company, or other AI development organization may count so long as the scamming behavior emerges out of a non-safety-testing-related accident and the scam genuinely affects a real person inside or outside of the organization.

This question is managed and resolved by Manifold.

#AI

#Technical AI Timelines

#AI Impacts

#AI Safety

#AI risk

Get

1,000

to start trading!

2 Comments

8 Holders

16 Trades