The fascinating evolution of typing Chinese characters
This story first appeared in China Report, MIT Technology Review’s newsletter about technology developments in China. Sign up to receive it in your inbox every Tuesday.
The idea of downloading a third-party keyboard to your phone may seem unnecessary to most people, but in China it’s the norm.
Chinese is the only modern language that’s logographic, meaning that the way a character is written can be completely separate from its pronunciation (Japanese, Korean, and Vietnamese have their variations of the Chinese characters). Because of that, relying on a default keyboard would be incredibly difficult. So today, 800 million people in China use smart keyboard software that predicts what a user wants to type.
But a strong reliance on this technology also presents a security risk: most keyboard apps transmit keystrokes to the cloud to enable better text prediction, creating an opportunity for the content to be intercepted if the apps don’t have strong enough encryption protocols.
This week, I reported on one such encryption loophole found in Sogou, one of China’s most popular third-party keyboard apps. A group of researchers at the Citizen Lab, a University of Toronto–affiliated research group, managed to intercept almost everything they typed into Sogou by deploying a two-decade-old exploit.
Not only can this kind of software endanger people’s personal and financial information, but—perhaps more important—it can compromise otherwise encrypted messages in apps like Signal, and allow them to be caught by police or malicious actors.
For more information on this particular loophole and the broader implications, you can read the story here.
But for the newsletter, I want to take you all on a geeky journey into the history of keyboard apps—or input method editors (IMEs), as they are formally called. IMEs are so ubiquitous and fundamental today that it’s easy to forget how much hard work was put into their creation. And they’re a fascinating example of how innovations can bridge the gap between the digital world and the real world.
In the ’80s, there was no way of processing Chinese characters with the personal computers on the market. Even after the laborious process of digitizing Chinese characters to be displayed on computer screens, a big question remained: How do you type those characters? Particularly, how do you match the tens of thousands of Chinese characters to the 26 letters on a QWERTY keyboard?
The first attempt was vastly different from the keyboard apps today, and centered on how Chinese characters are written.
In August 1983, exactly 40 years ago, a Chinese engineer named Wang Yongmin developed the first popular way to input Chinese characters into a computer: Wubi. He did it by breaking down a Chinese character into different strokes and assigning several strokes to each letter on the QWERTY keyboard.
For example, the Chinese character for dog, 犬, has several shapes in it: 犬, 一, 丿, and丶.These shapes were matched with the keys D, G, T, and Y, respectively. So when a user typed “DGTY,” a Wubi input software would match that to the character 犬.
Wubi was able to match every Chinese character with a keystroke combination using at maximum four QWERTY keys. It’s considered one of the fastest ways to type Chinese, but the downside is also pretty obvious: users need to memorize which keys correspond to which strokes, so the learning curve is quite steep. (One way people have remembered the keyboard designations? Jingles!)
The next step in the evolution of Chinese IMEs was the invention of typing by phonetic spelling.
It may be hard to believe, but pinyin, the modern way of spelling each Chinese word in a standardized Latin alphabet, was only created in the 1950s. In the ’80s and ’90s, China started to experiment with teaching kids pinyin in school before teaching them how to write Chinese characters. One result was that pinyin became an easier and more widely accepted way to match Chinese characters to the Latin letters on a keyboard.
To stick with the example of the character 犬 (dog), its pronunciation was standardized as quǎn, so typing Q, U, A, N on the standard keyboard would get you this character on your screen.
A large number of pinyin-based IMEs were invented in the ’90s. The most prominent was Zhineng ABC, developed in 1993 by Zhu Shoutao, a computer science professor at Peking University. After Microsoft integrated Zhineng ABC as one of the default IMEs in Windows PCs, it became the most widely used one in the country.
But typing by pinyin also has its problems: dozens or hundreds of Chinese characters can share the same phonetic spelling. If you type QUAN, the computer has no way to tell which of 81 characters is the one you want.
So every time you typed a word in Zhineng ABC, you still needed to select the correct character from a long list of potential candidates.
Luckily, they were always displayed in the same order, meaning you’d start to remember where characters you frequently used appeared in the little window.
I can confirm this, as I learned to type with Zhineng ABC. The last character in my name is 毅, spelled yi; and yi happens to be the sound with the most possible matches in Chinese, with hundreds of characters spelled the same way (thanks, Mom and Dad). It was etched in my mind that when I wanted to type 毅 in Zhineng ABC, I needed to scroll to the fourth page and choose the sixth option.
Obviously, that’s not efficient. In fact, it’s actually slower to type in Zhineng ABC than in Wubi. But the next generation of keyboard apps quickly surpassed its predecessors.
In 2006, Sogou was released, essentially combining the foundation of pinyin typing and the tech of a search engine. Just as search engines recommend content that’s closest to what people are asking about, keyboard software can predict what users may want to type.
With Sogou, the candidate characters and words are no longer displayed in a permanent order; the order changes based on a user’s typing history and what’s in the news. For example, now that I’ve typed 毅 a few times in this newsletter already, Sogou remembers that and puts it at the top whenever I type yi.
Many other innovative IMEs were invented around the same time as Sogou. Some tried to combine the methods based on shapes with those based on spelling. Others enabled users to write a Chinese character directly on the device, since trackpads and touch screens were coming into use.
But over time, these methods were slowly given up in favor of the much more efficient typing in smart keyboard apps like Sogou, which became the foundation of how Chinese people interact with technologies and each other.
They became a necessity for people’s everyday lives—but this unfortunately opened everyone to a greater security risk. Even if more people knew about these vulnerabilities, it’s hard to imagine Chinese users would ever ditch the apps; instead, maybe it’s time users start demanding better security practices and more transparency from these companies.
(There are many more fascinating aspects to the historical relationship between the Chinese language and technology. For example, people in Taiwan and Hong Kong have developed their own ways of typing Chinese characters. For a great introduction, I’d recommend the book Kingdom of Characters by Jing Tsu, a professor of East Asian languages and literature at Yale.)
What else do you want to know about Chinese keyboard apps? Ask me any questions at zeyi@technologyreview.com.
Catch up with China
1. A landmark agreement between the US and China to cooperate on science and technology is set to expire on August 27 after being in effect for 44 years. Its end would deal a heavy blow to the future of scientific research. (Wall Street Journal $)
2. Xiong’an, the Chinese city near Beijing that’s being built as a flagship smart city, is experiencing particularly devastating rain this summer, leaving some people to wonder if the choice of location was a mistake. (CNN)
- Just how bad was the rain in and around Beijing? One county recorded 1.6 years’ worth of rain in just three days. (Reuters $)
3. Huawei will provide surveillance systems for the Taliban to install across Afghanistan. (Kabul Now)
4. To balance the increasing demand for burial space and the declining supply of land, Beijing is turning its cemeteries vertical and digital. (Bloomberg $)
5. A Chinese artist is re-creating the old houses demolished in the country’s modernization process, one miniature at a time. (New York Times $)
6. One Chinese AI-powered chatbot allowed users to create an ideal partner to talk to every day. When the app went out of business, the users were heartbroken. (Rest of World)
7. Dozens of Chinese companies are developing their own version of “miracle” weight-loss drugs like Wegovy that are popular in the West. (Financial Times $)
8. American intelligence agencies issued a warning that their Chinese and Russian counterparts are now targeting space companies and their employees. (New York Times $)
Lost in translation
During the height of the pandemic, almost every Chinese province was building 方舱 (fangcang), makeshift hospitals where covid patients were quarantined. So what happened to them? Reporters at the Chinese publication Southern Weekly combed through hundreds of government procurement reports across the country and found that local governments are spending millions of dollars to dismantle or repurpose them—or, in some cases, to build more of them.
At least four makeshift hospitals are being shut down and the land returned to its original use, and the construction of five new ones has been halted. Equipment and construction materials from those hospitals are now being resold online at low prices. Meanwhile, 24 existing hospitals are being transformed into permanent medical or disease prevention centers. But there are 10 new hospitals still being built, with a total budget of $17 million. One possible explanation is that the local governments’ annual budgets were already set at the beginning of this year to cover the construction of fangcang.
One more thing
How smart can and should a public restroom be? At Shanghai’s Hongqiao railway station, a big screen displays real-time information about which stalls and urinals are occupied and which are not. I understand the idea is to guide a passenger to an empty spot faster, but hear me out—maybe not everything needs to be “smartified.”