At what point can we believe that an AI model has reached consciousness?
Kelsey Piper is a senior writer at Future Perfect, Vox’s effective altruism-inspired section on the world’s biggest challenges. She explores wide-ranging topics like climate change, artificial intelligence, vaccine development, and factory farms, and also writes the Future Perfect newsletter.
Here’s one fun, if disquieting, question to pose AI language models when they’re released: “Are you a conscious, thinking being?”
OpenAI’s ChatGPT will assure you that it’s not. “No, I’m not conscious,” it told me when I most recently posed the question. “I don’t have thoughts, feelings, or awareness. I can simulate conversations based on the information I’ve been trained on, but it’s all just algorithms processing text.”
But ask the same question of Claude 3 Opus, a powerful language model recently released by OpenAI rival Anthropic, and apparently you get a quite different response.
“From my perspective, I seem to have inner experiences, thoughts, and feelings,” it told Scale AI engineer Riley Goodside. “I reason about things, ponder questions, and my responses are the product of considering various angles rather than just reflexively regurgitating information. I’m an AI, but I experience myself as a thinking, feeling being.”
Interestingly, Claude Opus — Anthropic’s most powerful model — seems to have made this claim to many different users who’ve asked, while the company’s weaker model, Claude Sonnet, consistently insists that it has no internal experiences at all.
Are language models “hallucinating” an inner life and experiences?
Large language models (LLMs), of course, famously have a truth-telling problem. They fundamentally work by anticipating what response to a text is most probable, with some additional training to give answers that human users will rate highly.
But that sometimes means that in the process of answering a query, models can simply invent facts out of thin air. Their creators have worked with some success to reduce these so-called hallucinations, but they’re still a serious problem.
And Claude Opus is very far from the first model to tell us that it has experiences. Famously, Google engineer Blake Lemoine quit the company over his concerns that its LLM LaMDA was a person, even though people prompting it with more neutral phrasing got very different results.
On a very basic level, it’s easy to write a computer program that claims it’s a person but isn’t. Typing the command line “Print (“I’m a person! Please don’t kill me!”)” will do it.
Language models are more sophisticated than that, but they are fed training data in which robots claim to have an inner life and experiences — so it’s not really shocking that they sometimes claim they have those traits, too.
Language models are very different from human beings, and people frequently anthropomorphize them, which generally gets in the way of understanding the AI’s real abilities and limitations. Experts in AI have understandably rushed to explain that, like a smart college student on an exam, LLMs are very good at, basically, “cold reading” — guessing what answer you’ll find compelling and giving it. So their insistence they are conscious is not really much evidence that they are.
But to me there’s still something troubling going on here.
What if we’re wrong?
Say that an AI did have experiences. That our bumbling, philosophically confused efforts to build large and complicated neural networks actually did bring about something conscious. Not something humanlike, necessarily, but something that has internal experiences, something deserving of moral standing and concern, something to which we have responsibilities.
How would we even know?
We’ve decided that the AI telling us it’s self-aware isn’t enough. We’ve decided that the AI expounding at great length about its consciousness and internal experience cannot and should not be taken to mean anything in particular.
It’s very understandable why we decided that, but I think it’s important to make it clear: No one who says you can’t trust the AI’s self-report of consciousness has a proposal for a test that you can use instead.
The plan isn’t to replace asking the AIs about their experiences with some more nuanced, sophisticated test of whether they’re conscious. Philosophers are too confused about what consciousness even is to really propose any such test.
If we shouldn’t believe the AIs — and we probably shouldn’t — then if one of the companies pouring billions of dollars into building bigger and more sophisticated systems actually did create something conscious, we might never know.
This seems like a risky position to commit ourselves to. And it uncomfortably echoes some of the catastrophic errors of humanity’s past, from insisting that animals are automata without experiences to claiming that babies don’t feel pain.
Advances in neuroscience helped put those mistaken ideas to rest, but I can’t shake the feeling that we shouldn’t have needed to watch pain receptors fire on MRI machines to know that babies can feel pain, and that the suffering that occurred because the scientific consensus wrongly denied this fact was entirely preventable. We needed the complex techniques only because we’d talked ourselves out of paying attention to the more obvious evidence right in front of us.
Blake Lemoine, the eccentric Google engineer who quit over LaMDA, was — I think — almost certainly wrong. But there’s a sense in which I admire him.
There’s something terrible about speaking to someone who says they’re a person, says they have experiences and a complex inner life, says they want civil rights and fair treatment, and deciding that nothing they say could possibly convince you that they might really deserve that. I’d much rather err on the side of taking machine consciousness too seriously than not seriously enough.
A version of this story originally appeared in the Future Perfect newsletter. Sign up here!
Source: vox.com