Number puzzles like Sudoku are too complex for artificial intelligence (AI), scientists have determined. Machines have even more trouble explaining how they arrived at the solution.
Number puzzles have been a pastime known for millennia – they first appeared in ancient China, and newspapers began publishing them in the late 19th century. About 20 years ago, Sudoku gained global popularity, a puzzle first published in 1986 in the Japanese magazine “Nicoli.” Today, the game has millions of fans worldwide, and various versions of the mobile app alone have been downloaded by approximately 200 million users.
Sudoku involves filling in empty squares on a 9×9 square grid with numbers. Each row, column, and 3×3 square (the so-called number block) into which the grid is divided must contain one digit from 1 to 9, and no digits can be repeated. Mathematicians from the University of Sheffield (UK) proved in 2005 that there are approximately 6 quadrillion possible valid Sudoku grids (6 x 10 to the 21st power). Other versions of the game also exist – for example, a 6×6 grid must be filled with the digits 1 to 6.
Now, it turns out that Sudoku poses a challenge for artificial intelligence. Although AI is making enormous progress in areas such as analyzing large data sets, generating text, images, and videos, and translating, logical tasks are its weak point. This was confirmed by researchers from the University of Colorado at Boulder (USA), whose article on the subject appeared in the “ACL Anthology,” a collection of over 110,000 papers collected by the Association for Computational Linguistics (ACL).
As the paper's lead author, computer science and machine learning expert Anirudh Maiya, explained, solving Sudoku has several important elements. “You have to proceed step by step, constantly re-evaluate the number fields, and consistently follow the rules. Puzzles like these are fun, but they also provide an ideal microcosm for studying decision-making in machine learning,” the expert explained.
For the study, Maiya and his team created 2,300 Sudoku puzzles of varying difficulty in a 6×6 grid. The researchers then assigned them to several large language models (LLMs), including o1, Llama-3.1, Gemma-2, and Mistral, to solve them.
The experiment showed that the task was too difficult for all AI models—they managed to solve a total of 0.4% of the puzzles. The researchers attribute this to the fact that AI doesn't think logically, but rather determines solutions based on probability. Therefore, rule-based and reasoning-based tasks are difficult for AI models. “AI models struggle to simultaneously consider all the limiting factors in a number grid,” the authors explained.
Among the LLM subjects, o1 performed best, solving approximately 65% of the Sudoku puzzles. However, as the difficulty of the puzzles increased, his success rate also dropped.
Even more problems arose when the researchers asked the AI to explain how it arrived at the solution to the puzzle. Of all the models tested, only 5% of the time were able to correctly justify the specific numbers entered. Often, the answers were incorrect or unclear. “For example, the AI said, 'There can't be a 2 here because there's already a 2 in this row,' which wasn't true,” said study co-author Dr. Ashutosh Trivedi.
He added that in some situations, the AI ignored the number combinations on the board or came up with absurd explanations. In one such case, during a conversation about Sudoku, one of the models suddenly gave a weather forecast. “The AI was completely confused and reacted in a bizarre way,” Dr. Trivedi said.
According to the authors, the study's results show that despite AI's impressive achievements, it cannot be fully relied upon, especially in tasks requiring precise reasoning. “Many people talk about AI models' new abilities that one might not expect. However, at the same time, it's not surprising that they still perform poorly in many tasks,” concluded Anirudh Maiya. (PAP)
abu/ agt/