ARC-AGI doesn't measure intelligence. Intelligence is competence in ridiculously transferrable skills and knowledge. The transfer is bi-directional between different tasks.
If ARC-AGI measured a skill that is ridiculously transferrable, applicable across many diverse topics, LLMs would have learned this skill by learning competence across other kinds of generalist tasks. They didn't.
O3 achieved high scores in these tasks now, probably mostly because they were trained on 75% of the public ARC-AGI benchmark set, allowing it to learn the special skills needed for these tasks.
Since ARC-AGI skills are clearly super special, as in not relevant for anything else, and human-imitative, they do not relate to intelligence at all. It is easy to come up with special tasks invoking special skills which do not apply to any other tasks.
For example, as a contrived example, let's take an arbitrary hash function with an arbitrary seed and produce a sequence of numbers with it. The task is to guess the next number from the previous one. The skill to do this can only apply to this hash function and seed and doesn't generalize or transfer to any other actually useful task.
ARC-AGI is like that, except the hash function is human. This skill has very limited transfer and that is exactly the feature which makes it "difficult" for AIs. If it was a skill that actually means intelligence, it would have been paradoxically learnable by becoming competent in other, unrelated tasks. If it was a truly important skill among all skills related to intelligence, it would have been among the first skills LLMs would have learned as such important, core intelligence skills, are present in almost all tasks.
#
UniversalEmbodiment #
AI #
ARCAGI #
AGI #
LLMs