AI is Better at Fixing Itself Than Helping Humans - Introducing OpenAI's New Bug Hunter

AI is Better at Fixing Itself Than Helping Humans - Introducing OpenAI's New Bug Hunter

OpenAI has created a new model called CriticGPT, designed to find errors in programming code generated by GPT-4.

In a blog post, OpenAI announced that it had trained a new bug-finding model on GPT-4 and found that when CriticGPT was used to review code written by ChatGPT itself, it was 60% better than without AI help.

While one should always double-check anything created by AI, this is a step toward improving the quality of the output; according to OpenAI, users can now have more confidence in code created by chatbots.

Nevertheless, OpenAI added the disclaimer that "CriticGPT's suggestions are not always correct."

There are at least two main ways in which this new model from OpenAI is good news for ChatGPT users. One is that the output generated by AI chatbots should still be monitored by human eyes, and the addition of AI assistants specifically trained to spot mistakes will somewhat ease the burden of this monitoring task.

Second, OpenAI has begun integrating CriticGPT-like models into its "reinforcement learning from human feedback" (RLHF) alignment pipeline to help humans supervise AI in difficult tasks.

According to OpenAI, an important part of this process is for people called AI trainers to evaluate different ChatGPT responses against each other. This process has worked relatively well so far, but as ChatGPTs become more accurate and their mistakes more subtle, the task of finding inaccuracies may become increasingly difficult for AI trainers.

"This is a fundamental limitation of RLHF," OpenAI states, "and as models gradually become more knowledgeable than humans who can provide feedback, it may become increasingly difficult to adjust them.

Last year, OpenAI already explained that future generations of AI systems may be too complex for humans to fully understand. If a model generates a million lines of complex code, can you trust a human to reliably determine whether that code is safe to execute?

In training CriticGPT, we had to look over the input containing mistakes and critique it; AI trainers manually inserted mistakes into the code written by ChatGPT and wrote sample feedback as if they had found the mistakes themselves, helped train the model. Experiments were then conducted to see if CriticGPT could catch both manually inserted bugs and bugs that ChatGPT inserted by accident.

AI trainers liked CriticGTPs feedback better than the feedback given by ChatGPT in 63% of cases where the bug occurred spontaneously. They were also less likely to see problems as hallucinations.

Categories