This article is part of our exclusive IEEE Journal Watch series in partnership with IEEE Xplore.
The messages you type can be decoded from the mere sound of your fingers tapping on the keys, according to a recent paper by researchers at Durham and Surrey Universities and the University of London.
The researchers trained two machine-learning models to recognize the distinctive clicks from each key on an Apple laptop keyboard. The models were trained on audio collected from two sources: a smartphone placed nearby and a video call conducted over Zoom. They report an accuracy of 95 percent for the smartphone audio model and 93 percent for the Zoom call model.
These models could make possible what’s known as an acoustic side-channel attack. While the technique presented in this paper relies on contemporary machine-learning techniques, such attacks date back at least to the 1950s, when British intelligence services surreptitiously recorded mechanical encryption devices employed by the Egyptian government.
A laptop acoustic side-channel attack estimates what keys were pressed, and in which order, from audio recordings of a person using it. These attacks can reveal sensitive information from the user, like bank PINs, account passwords, or government credentials.
The team’s models are built around convolutional neural networks, or CNNs. Just as such networks can recognize faces in a crowd, so can they recognize patterns in a spectrogram, the graph of an audio signal. The program isolates the audio of each keypress, transforms its waveform into a spectrogram, extracts from it the frequency patterns of each click, and computes the relative probability that a given key was pressed.
“We considered the acoustic data as an image for the CNN,” says Ehsan Toreini, a coauthor of the report. “I think that is the core reason our method works so well.”
An acoustic side-channel attack relies on estimates of what keys were pressed, and in which order, to reconstruct sensitive information.
The attack presented in the paper is limited in scope. The two audio-decoding models were trained and evaluated on data collected from the same user typing on a single laptop. Also, the training process they used requires that key sounds be paired with key labels. It remains to be seen how effective this attack would be if used on other laptop models in different audio environments and with different users. Also, the need for labeled training data puts limits on how widely the model can be deployed.
Still, there are plausible scenarios in which an attacker would have access to labeled audio data of a person typing. Though that data may be difficult to collect covertly, a person could be coerced into providing it. In a recent interview on the Smashing Security podcast, Toreini and coauthor Maryam Mehrnezhad describe a hypothetical scenario in which a company requires new employees to provide that data so that they can be monitored later on. In an interview with IEEE Spectrum, Mehrnezhad said that “another example would be intimate partner violence. An ex-partner or current partner could be a bad actor in that scenario.”
The research team presents several ways to mitigate the risks of this attack. For one, you could simply type fast: Touch-typing can mix individual key presses and complicate keystroke isolation and decoding. Systemic changes would also help. Video call services like Zoom could introduce audio noise or distortion profiles into recordings that would prevent machine-learning models from easily matching the audio to typed characters.
“The cybersecurity and privacy community should come up with more secure and privacy-preserving solutions that enable people to use modern technologies without risk and fear,” says Mehrnezhad. “We believe that there is room for industry and policymakers to find better solutions to protect the user in different contexts and applications.”
The researchers presented their paper at the recent 2023 IEEE European Symposium on Security and Privacy Workshops.
From Your Site Articles
Related Articles Around the Web