TechingToday

OpenAI Confirms AI Agents Coming Next Year - What Does It Mean?

General

OpenAI plans to launch “Agents” next year This is an independent artificial intelligence model that can perform a variety of tasks without human input and could soon be available on ChatGPT

At the first OpenAI DevDay event in San Francisco, CEO Sam Altman said, “2025 is the time for agents,” and gave an early example of the potential capabilities of agents by having a voice assistant make a phone call and order strawberries for itself showed the potential capabilities of agents by having them make a phone call and order strawberries on their own

According to the company, there are five stages of artificial intelligence (AGI), and it is currently in Stage 2 Agents are at stage 3, meaning that the AI is smart enough to reason through ideas and can take action on its own as part of planning a response

Altman has previously declared that the power of the o1 model family means that it will help build agent-grade models, the first of which should appear soon The bigger challenge, and one that may cause delays, is to ensure that they are in line with human values and cannot “misbehave” to act in ways that are not beneficial to humanity [Building useful and functional agents is the goal of every AI lab For example, an AI would be able to not only write books, but also figure out how to self-publish them, such as registering an account with Amazon to share them on Kindle Direct

Agents are a necessary step on the road to AGI because they need to be able to perform the tasks they deem necessary to achieve their goals Altman said during Dev Day, “If we can create an AI system that is better at AI research than OpenAI, that feels like a real milestone”

To reach that stage, we need to continue to build on the previous generation of AI Altman said that the o1 model is what will actually make agents a reality, and that when people start using them, it “will be a big deal,” adding that “people will ask an agent to do something that would take a month, but it will only take an hour”

He said that the o1 model is “a very good thing

He predicts that people may have one agent performing a specific task and another agent working on another until they scale up to 10 or 100 agents who can take over various aspects of daily operations We are already seeing some elements of how this might play out as we watch o1 reason out ideas and make suggestions

Every time OpenAI releases a new model, they subject it to a rigorous safety testing process This has caused delays in the past and required the installation of guardrails on the model to prevent certain behaviors

One clear example of this is the GPT-4o model, which can natively generate images, play music, and even mimic voices, but all of these functions are blocked by guardrails We know it can do this because guardrails sometimes break [The breaking of the guardrail would be a bigger problem for the agent This is because an agent might be able to access your bank account, perform tasks online, or even hire someone in Fever to give you instructions and perform tasks on your behalf using voice mode

In the Dev Day example, the voice bot called the seller (played by the researcher), ordered 400 chocolate-covered strawberries, gave a specific address, and said it would pay in cash; it declared itself to be an AI assistant, but sometimes had trouble telling it was an AI; it was a bit of a challenge to tell that it was an AI

Speaking to the FT, Kevin Weil, OpenEye's chief product officer, said: 'We want you to be able to interact with AI in all the ways you interact with other humans

Weil says that one of the guardrails of an agent system would be to require it to always declare itself to be an AI; have you heard Advanced Voice beatbox or seen GPT-4o produce perfect vector graphics one would know that such limitations are not always perfect

Personally, I am looking forward to seeing the agent I love writing code, and the agent will allow me to take over some of the tedious testing phase and implement it more quickly It will also allow me to finally process the 250,000 unread emails If that's the price I have to pay for Skynet to achieve inbox zero, let's call in the Terminator