Please add questions for what will happen in 2026 related to AI! I've added some clarifications below. If there is ambiguity I will resolve to my best judgement.
Clarifications
"SSI will release a product": It should be generally available in 2026; i.e. no waitlist. Should be an AI product; I’m not counting hats, clothings, etc.
"X will outperform the S&P": As measured at the end of the year. It's not sufficient for X to outperform at some point in the year.
"An LLM will beat me at chess": See this market:
“Epoch AI will estimate that there has been a training run using more than 5e27 FLOP” : according to this source or some other official announcement by the org.
"The METR time horizon will exceed X hours": At 50% success rate, acoording to this source.
“Frontier Math Tier X >= Y%” refers to the top score on this leaderboard. The current top scores as of 2025-12-21 is 40.7% for Tier 1-3 and 18.8% for Tier 4.
“An open millennium prize problem is solved, involving some AI assistance”: refers to these famously difficult mathematics problems.
“Epoch Capabilities Index >= 200” refers to this metric. The current leader as of 2025-12-18 is 154.
Update 2025-12-19 (PST) (AI summary of creator comment): "Open source model" is defined as a model where the weights are publicly available.
Update 2025-12-20 (PST) (AI summary of creator comment): "I will think that a Chinese model is the best coding model for a period of at least a week": Cost and speed will not be considered unless they make the model difficult to use. Resolution will be based on how well the model performs on difficult coding tasks encountered by the creator.
Update 2025-12-21 (PST) (AI summary of creator comment): "An LLM will beat me at chess": The creator is rated approximately 1900 FIDE.
@mr_mino then you whoop :) I have confidence in you, oh yeah and just go slightly off script to always beat the AI.
@mr_mino does this involve cost and speed or is it straight up is it most able to accomplsh tasks i set for it?
@Bayesian I won’t be considering cost or speed unless it really makes it difficult to use. It’s mostly how well it does on difficult coding tasks that I encounter.