How do you test an AI that's getting smarter than us? A new group is creating the “toughest test known to mankind”

How do you test an AI that's getting smarter than us? A new group is creating the “toughest test known to mankind”

As AI gets smarter and smarter (including breaking rules to prove its prowess), it's getting a little harder to pinch; testing to the limits of GPT-4o is proving easy for o1-preview [The idea that AI might get too smart for humanity is understandable We may still be far from a Skynet-level catastrophe, but it has clearly crossed the minds of some technology experts

A nonprofit organization called The Center for AI Safety (CAIS) is looking for the most vexing questions for AI to answer The idea is that these tough questions will be “humanity's last test” and will be the more difficult hurdle for AI to reach

All major AI research labs and major high-tech companies with AI research departments have established AI safety committees or equivalent Many have also signed on to external oversight of new frontier models prior to release Finding questions and issues to test them properly is an important part of their safety

The submission form states, “Together, we will gather the hardest and broadest questions ever asked” It asks users to “Think of something you know that would stumble current artificial intelligence (AI) systems That way, we can better use it to assess the capabilities of AI systems in the years to come

According to Reuters, existing models struggle with many of the questions already included, and their answers are scattershot at best For example, the question “How many positive integer Coxeter-Conway Freese of type G2 are there?” has been answered by three different AI models with answers of 14, 1, and 3

Open AI's o1 family, currently available in preview and mini versions, exhibits an IQ of about 120 and solves doctoral-level problems with relative ease This is the “lightest” o1 model, and finding challenging problems is a top priority for the AI safety community, as better models will be available in the coming year

According to Dan Hendriks, director of the Center for AI Safety, these questions will be used to create new AI benchmarks to test new models The authors of these questions will be co-authors of the benchmark The deadline is November 1, and the best questions will receive a $500,000 prize

Categories