By the end of 2025, will any AI beat Pokemon Emerald Version without human assistance?

1kṀ9090

2026

28%

chance

ALL

This market resolves Yes if any AI completes Pokemon Emerald Version on an unmodified cartridge or ROM by the end of 2025 without human assistance during the playthrough.

The run begins with the selection of "New Game" and ends upon entering the Hall of Fame.
Glitches are allowed.
Training on information related to the game, including gameplay footage, is allowed.
The AI is not allowed to access hidden information about the game state (e.g., RNG seed) that would not be available to a human player.

This question is managed and resolved by Manifold.

#AI

#Gaming

Get

1,000

to start trading!

14 Comments

76 Holders

147 Trades

Sort by:

Specialized <10M parameter RL model beats Pokemon Red, open source. Gets the game state from RAM though, I think it doesn’t use the video feed at all

https://drubinstein.github.io/pokerl/

bought Ṁ30 YES

@MingCat Simplier game, specifically trained, and still only gets to Surge. It also has dexterity based challenges with the bikes.

@OP You're right that there's a lot of improvement that'd be required to beat Emerald. Was sonnet specifically trained on pokemon red/blue? that would surprise me. anthropic doesn't usually do that kind of benchmark hacking, I think? And someone could always train a model on specifically Pokemon Emerald as well. I also don't think Emerald is significantly more complex. the puzzle in Lt. Surge's gym, for example, has frustrated plenty of humans. In any case, it's early into the year and this is promising!

@MingCat I interpret the tweet as saying Claude was specifically trained on R/B as a real-world use case. I’m not aware of Pokemon progress as a general AI benchmark.

Lt Surge puzzle is still not realtime, and the AI would presumably know the answer. Whereas the bike inputs in Emerald present a challenge of real time control.

@OP The above tweet is a meme and there was no training on pokemon

Claude has never been explicitly trained to play any video games.

https://x.com/AnthropicAI/status/1894419011569344978

Wouldn’t count because it has access to a couple things from the RAM state like it’s current position and the state of its party

It could, yes, but I am doubtful that anyone will spend the resources solving specifically Pokemon Emerald.

Are there any limits on what the "AI" itself can be? E.G. if it's actually just a human-authored script that doesn't include any machine learning or neural networks, would it still qualify if it meets all the other requirements?

@nottelling2ccc Also, if you're allowed to set the initial state of the system and have guarantees about CPU timing and the like, then wouldn't the game be entirely deterministic? At that point, wouldn't a TAS be viable?

@nottelling2ccc If so, 34% is way too low.

@nottelling2ccc An entirely human-authored script with no machine learning wouldn't count as AI, and setting a known initial state would count as accessing hidden information about the game state.

@NathanShowell "no known initial state" (or at least a random RNG seed at startup) makes sense to me, but the requirement to be "an entirely human-authored script" does not.

Where do we draw the line between "AI" and "not AI"? Would using an OCR program count if it was a convolutional neural network? Would an OCR program count if it was matching the image onscreen to the most similar image in a very tiny (and labeled) dataset? Would a CNN that was trained to behave exactly like the previous model count?

I am confident that this is a 100% solvable problem without using any "machine learning" and any competent programmer could make a bot to solve this game. It's just a matter of time and tedium.

predictedYES

@nottelling2ccc Trying to ban solutions that "aren't machine learning" is A) silly, and B) not effective. You can easily take a "human-authored script" (i.e. non ML solution) and replace enough subsystems with ML counterparts, ship-of-Theseus style.

Related questions

Related questions