Many websites come with web forms, for example, to sign-in to an account, create a new account, leave a public comment or contact the website owner. What most Internet users may not know is that data that is typed on sites may be collected by third-party trackers, even before the data is sent.
A research team from KU Leuven, Radboud University and University of Lausanne, analyzed the data collecting of third-party trackers on the top 100K global websites. Results have been published in the research paper Leaky Forms: A Study of Email and Password Exfiltration Before Form Submission.
Leaked data included personal information, such as the user’s email address, names, usernames, messages that were typed into forms and also passwords in 52 occasions. Most users are unaware that third-party scripts, which includes trackers, may collect these kind of information when they type on sites. Even when submitting content, most may expect it to be confidential and not leaked to third-parties. Browsers do not reveal the activity to the user; there is no indication that data is collected by third party scripts.
Results differ based on location
Data collecting differs depending on the user’s location. The researchers evaluated the effect of user location by running the tests from locations in the European Union and United States.
The number of email leaks was 60% higher for the location in the United States than it was for the location in the European Union. In numbers, emails were leaked on 1844 sites when connecting to the top 100k websites from the European Union and on 2950 sites when connecting to the same set of sites from the United States.
The majority of sites, 94.4%, that leaked emails when connecting from the EU location did leak emails when connecting from the US as well.
Leakage when using mobile web browsers was slightly lower in both cases. 1745 sites leaked email addresses when using a mobile browser from a location in the European Union, and 2744 sites leaked email addresses from a location in the United States.
More than 60% of leaks were identical on desktop and mobile versions according to the research.
The mobile and desktop websites where emails are leaked to tracker domains overlap substantially but not completely.
One explanation for the difference is that mobile and desktop crawls did not took place at the same time but with a time difference of one month. Some trackers were found to be active on mobile or desktop sites only.
The researchers suggest that stricter privacy European privacy laws play a role in the difference. The GDPR, General Data Protection Regulation, applies when sites and services collect personal data. Organizations that process personal data are responsible for complying with the GDPR.
The researchers believe that email exfiltration by third parties “can breach at least three GDPR requirements”.
First, if such exfiltration happens surreptitiously, it violates the transparency principle.
Second, if such exfiltration is used for purposes such as behavioral advertising, marketing and online tracking, it also breaches the purpose limitation principle.
Third, if the email exfiltration is used for behavioral advertising or online tracking, the GDPR typically requires the website visitor’s prior consent.
Only 7720 sites in the EU and 5391 sites in the US did display consent popups during connects; that’s 7.7% of all EU sites and 5.4% of all US sites.
The researchers discovered that the number of sites with leaks decreased by 13% in the US and 0.05% in the EU when rejecting all data processing using consent popups. Most Internet users might expect a reduction by 100% when not giving consent, but this is apparently not the case. The low decrease in the EU is likely caused by the low number of websites with detected cookie popups and observed leaks.
Site categories, trackers and leaks
Sites were added to categories such as fashion/beauty, online shopping, games, public information and pornography by the researchers. Sites in all categories, with the exception of pornography, leaked email addresses according to the researchers.
Fashion/Beauty sites leaked data in 11.1% (EU) and 19.0% (US) of all cases, followed by Online shopping with 9.4% (EU) and 15.1% (US), General News with 6.6% (EU) and 10.2% (US), and Software/Hardware with 4.9% (EU) and Business with 6.1% (US).
Many sites embed third-party scripts, usually for advertising purposes or website services. These scripts may track users, for example, to generate profiles to increase advertising revenue.
The top sites that leaked email address information were different depending on the location. The top 3 sites for EU visitors were USA Today, Trello and The Independent. For US visitors, they were Issuu, Business Insider, and USA Today.
Further analysis of the trackers revealed that a small number of organizations was responsible for the bulk of form data leaking. Values were once again different depending on location.
The five organizations that operate the largest number of trackers on sites that leak form data were Taboola, Adobe, FullStory, Awin Inc. and Yandex in the European Union, and LiveRamp, Taboola, Bounce Exchange, Adobe and Awin in the United States.
Taboola was found on 327 sites when visiting from the EU, LiveRamp on 524 sites when visiting from the US.
Protection against third-parties that leak form data
Web browsers do not reveal to users if third-party scripts collect data that users input on sites, even before submitting. While most, with the notable exception of Google Chrome, include anti-tracking functionality, it appears that they are not suitable for protecting user data against this form of tracking.
The researchers ran a small test using Firefox and Safari to find out of the default anti-tracking functionality blocked data exfiltration on the sample. Both browsers failed to protect user data in the test.
Browsers with built-in ad-blocking functionality, such as Brave or Vivaldi, and ad-blocking extensions such as uBlock Origin, offer better protection against data leaking. Users on mobile devices may use browsers that support extensions or include ad-blocking functionality by default.
The researchers developed the browser extension LeakInspector. Designed to inform users about sniffing attacks and to block requests that contain personal information, LeakInspector protects users data while active.
The extension’s source is available on GitHub. The developers could not submit the extension to the Chrome Web Store, as it requires access to features that are only available in Manifest 2. Google accepts Manifest 3 extensions only in its Chrome Web Store. A Firefox extension is being published on the Mozilla Add-ons store for Firefox.
Now You: what is your take on this?