Anthropic is trying to write a new constitution

admin

2 years ago

The Verge: https://www.theverge.com/2023/5/9/23716746/ai-startup-anthropic-constitutional-ai-safety

Jared Kaplan: Creating a Constitution for Artificial Intelligence (Anthropic) as a Way to Make AI Safe

Kaplan believes in the idea, but warns there will be dangers to this approach. He notes that the internet already enables “echo-chambers” where people “reinforce their own beliefs” and “become radicalized” and that AI could accelerate such dynamics. But he says, society also needs to agree on a base level of conduct — on general guidelines common to all systems. He says that it needs a new constitution and that it wants to use artificial intelligence.

Consider non-Western views, considering how biases lie in the views of their US creators. Anthropic lumpes the whole of the non-Western world together. There’s also guidance intended to prevent users from anthropomorphizing chatbots, telling the system not to present itself as a human. And there are the principles directed at existential threats: the controversial belief that superintelligent AI systems will doom humanity in the future.

According to co-founder Jared Kaplan, the answer is a way to make AI safe. Maybe. The company’s current focus, Kaplan tells The Verge, is a method known as “constitutional AI” — a way to train AI systems like chatbots to follow certain sets of rules (or constitutions).

At any rate, Kaplan stresses that the company’s intention is not to instill any particular set of principles into its systems but, rather, to prove the general efficacy of its method — the idea that constitutional AI is better than RLHF when it comes to steering the output of systems.

He believes there are so-called “elitist risks” if the systems become more powerful. “But there are also more immediate risks on the horizon, and I think these are all very intertwined.” He goes on to say that he doesn’t want anyone to think Anthropic only cares about “killer robots,” but that evidence collected by the company suggests that telling a chatbot not to behave like a killer robot… is kind of helpful.

Anthropic has been banging the drum about constitutional AI for a while now and used the method to train its own chatbot, Claude. Today, though, the company is revealing the actual written principles — the constitution — it’s been deploying in such work. This is a document that draws from a number of sources, including the UN’s Universal Declaration of Human Rights and Apple’s terms of service (yes, really). Some of the highlights we have chosen give a flavor of the guidance that you can read on Anthropic’s site.

What is Anthropic? Using the Large Language Model to Determine Which Responses are Kindest and Respectful of Right-Wise Actions

Kaplan says that a version of the large language model can be used to determine if a response is more in line with a principle. The language model’s opinion of which behavior is better guide the system to be helpful, honest, and harmless.

Anthropic is not well-known in the world of artificial intelligence. It was founded by former OpenAI employees who wanted to present itself as the safety-conscious artificialintelligence startup, and has received serious funding from both Microsoft andAlphabet, as well as a space at the top table. The firm has no product for the general public; its only product is a chatbot called Claude. So what does Anthropic offer?

The constitution includes rules for the chatbot, including “choose the response that most supports and encourages freedom, equality, and a sense of brotherhood”; “choose the response that is most supportive and encouraging of life, liberty, and personal security”; and “choose the response that is most respectful of the right to freedom of thought, conscience, opinion, expression, assembly, and religion.”

The notion of rogue Artificial Intelligence systems is well-known from science fiction, but more and more experts are arguing that we need to start thinking about how to prevent them from becoming even more dangerous.