Hundreds of the world’s top sites are recording users’ keystrokes in real-time and sending them to third-party servers, exposing potentially sensitive data to the risk of theft, according to new research.
Princeton researchers Steven Englehardt, Gunes Acar and Arvind Narayanan investigated the widespread use of session replay scripts used by website owners to record keystrokes, mouse movements and scrolling behavior, along with the entire content of visited pages.
These scripts, provided by third-party analytics companies, are intended to record full browsing sessions, which can be played back by the web owner to learn how their site is being used and how it can be improved.
This means that even information subsequently deleted by the user is recorded and can be played back.
“However, the extent of data collected by these services far exceeds user expectations; text typed into forms is collected before the user submits the form, and precise mouse movements are saved, all without any visual indication to the user,” Englehardt wrote.
“This data can’t reasonably be expected to be kept anonymous. In fact, some companies allow publishers to explicitly link recordings to a user’s real identity.”
The trio studied seven of the top session replay companies — Yandex, FullStory, Hotjar, UserReplay, Smartlook, Clicktale, and SessionCam — and found their services in use on 482 of the Alexa top 50,000 sites.
The issue is that sensitive information entered by a user — including info on medical conditions, credit card details and more — could end up leaking to the third-party provider’s servers.
“This may expose users to identity theft, online scams, and other unwanted behavior,” Engelhardt added. “The same is true for the collection of user inputs during checkout and registration processes."
The researchers highlighted four vulnerabilities: attempts to automatically excluding password input fields from recordings often failed, sensitive data is often redacted in a partial and imperfect way, recording services increase exposure to data breaches and session recording companies expect sites to manually label all PII, which doesn’t happen.
“The replay services offer a combination of manual and automatic redaction tools that allow publishers to exclude sensitive information from recordings. However, in order for leaks to be avoided, publishers would need to diligently check and scrub all pages which display or accept user information,” explained Engelhardt.
“For dynamically generated sites, this process would involve inspecting the underlying web application’s server-side code. Further, this process would need to be repeated every time a site is updated or the web application that powers the site is changed.”
Paul Edon, director at Tripwire, claimed this activity is little different from that of cyber-criminals and could even breach regulatory requirements such as PCI DSS and the forthcoming GDPR.
“If these websites do not alert the user to the fact that they are recording keystrokes, then I would class this under ‘nefarious activity’ as it is being less than honest, and the information is being collected without the user's knowledge,” he argued.
“The collection and storage of information not submitted by a potential customer will definitely be a breach of the EU GDPR, as permission to collect, store and process the data has not been given.”