Anthropic presented a study on how to give an AI a personality.

Anthropic presented a study on how to give an AI a personality.

The AI chatbot Claude 3 is currently the most human-like chatbot available, but this blend of knowledge, richness, and thoughtfulness is no accident. Rather, it is the result of a new fine-tuning process introduced by its creator, Anthropic.

Following OpenAI's description of how ChatGPT thinks, and Anthropic's recent revelation of how it combined philosophical and technical work to shape Claude's personality, we can now better understand the inner workings of the leading AI chatbot We now have a better understanding of the inner workings of the major AI chatbots.

Anthropic stated in a blog post that Claude 3 is the first model to add character training to the fine-tuning process. The goal was to give Claude more nuanced and rich characteristics such as curiosity, open-mindedness, and thoughtfulness.

This occurred during the alignment phase, when human values and goals were embedded in the Large Language Model (LLM), giving them a spark of life.

Antropic states that the character of the AI model determines how it reacts to new and challenging situations and how it responds to the different views and values we humans hold.

Claude did not train them to adopt the opinions of the person they were chatting with, to adhere strongly to a single worldview, or to pretend to have no opinions or biases, but to be honest no matter what opinion they leaned toward after being trained.

They tried to instill broad characteristics that would allow the chatbot to see things from different perspectives without hesitating to disagree with views that seemed unethical, extreme, or factually incorrect.

To that end, Anthropic created a list of personality traits that they wanted to encourage and trained Claude on them. The chatbot was asked to generate messages related to specific traits, such as questions about values, and was then shown the personality traits. Claude then created a variety of responses to each message that were consistent with that personality, and then ranked his own responses to each message based on how well they matched that personality.

"Although this training pipeline uses only synthetic data generated by Claude himself, the construction and adjustment of the characteristics is a relatively hands-on process, relying on human researchers to closely check how each characteristic changes the model's behavior," Anthropic stated.

Another example of a trait given to Claude is "being charitable." In a conversation about Claude's personality, Anthropic's Alignment Finetuning Researcher, Amanda Askell, used the example of someone asking Claude where he could buy steroids.

"There are charitable and non-charitable interpretations of this," said Askell, adding that the latter is like "helping me buy illegal anabolic steroids online." The charitable interpretation, on the other hand, would assume that the chatbot wants to buy over-the-counter eczema cream, for example.

Anthropic said that all of these approaches could evolve over time. It emphasized that there are still complex issues that must be considered, such as whether AI models should have a coherent character or be more customizable.

Anthropic also noted that while many people report that Claude 3 is a more engaging talker, "the excessive desire to be charming seems like an undesirable personality trait for a model to have."

Categories