ChatGPT's Algorithm: The Chinese Connection
OpenAI lately unveiled a new algorithm, dubbed o1, which they label as their "reasoning" model. The idea behind o1 is that it spends more time pondering before it responds, resulting in superior answers. However, some web users have noticed that the program seems to be contemplating not just in English but in various different languages. People have spotted Chinese characters and code that appears to be in several other languages randomly appearing in the program's responses.
Most folks only focus on the final output ChatGPT produces, but users do have the option to view the behind-the-scenes reasoning process. And that's where many noticed that the LLM had started incorporating Mandarin and Cantonese into its thought process.
Questions started pouring in. One user, Rishab Jain, asked on X, "Why did o1 randomly start thinking in Chinese? No part of the conversation (5+ messages) was in Chinese... very interesting... training data influence." Another user, Nero, expressed their surprise on January 5th, saying, "uhmm, why is my gpt o1 thinking in Chinese, lol." They tagged both OpenAI and ChatGPT in their message but didn't get a reply.
The obvious answer would appear to be that the algorithm was likely trained on large amounts of Chinese data, and thus, that data is influencing the algorithm's output. Rohan Paul, an AI engineer, provided an explanation, stating that certain languages might offer tokenization efficiencies or easier mappings for specific problem types. Therefore, o1 might be switching languages because its internal representation of knowledge finds that using Chinese can lead to more optimized computation paths when handling certain problems.
Another online commentator, Raj Mehta, gave a similar explanation, saying that o1, like many large language models (LLMs), operates in a shared latent space where concepts are abstract, not tied to specific languages. It might "reason" in the language that maps most efficiently to the problem at hand.
Gizmodo reached out to OpenAI for comment but didn't receive an immediate explanation. TechCrunch interviewed Luca Soldaini, a research scientist at the Allen Institute for AI, who seemed to offer the best response to the puzzle. Soldaini pointed out that due to the opacity of OpenAI's algorithms, it's impossible to say for sure why the program is behaving in this manner.
Indeed, the black-box nature of corporate AI algorithms is particularly ironic given OpenAI's self-proclaimed "open" mission of transparent technological development. As Rishab Jain put it, "OpenAI isn't so open, which means that when its algorithms do weird things like spout Chinese, all we can really do is scratch our heads and wonder why."
The future of AI technology might see artificial-intelligence models like o1 switching languages based on optimized computation paths, as suggested by Rohan Paul. This advancement in tech could potentially revolutionize the way we approach problem-solving using AI, leveraging the strengths of various languages.
Given the ongoing debate about the opaque nature of AI algorithms, understanding why o1 is switching to Chinese, as seen in some responses, remains a question for future research in the field of artificial-intelligence and natural language processing.