Poland-resident privacy expert Alexander Hanff was setting up a new website for one of his privacy projects. He wanted to be able to gather statistics on visitors without collecting any personal information that might impinge on their privacy. Rather than rely on Apache's access logs – which he describes as 'a little cumbersome' – he decided to save acceptable data to a database that could then be used to generate tables and charts.
In particular he was looking to gather geographical statistics on visitors without having to collect IP addresses. He chose to use an Apache module called GeoIP. But since the site development was being done on his local network and was not directly addressable from the intent, GeoIP didn't function properly on the test site.
To check it, he uploaded his data collection script to his production website. "I then sent myself the following text link via a DM in TweetDeck: http://mydomain.com/stats.php?ref=twitter."
When he checked his database to make sure that his script was working, he found the relevant entry complete with basic geography; that is 'PL'. but he also found four other entries with 'US'. Since this was a private site locally developed and not 'known' to the internet, he was a little surprised.
Hanff went back to the full Apache logs to discover the IP addresses of the extra visitors, and discovered that for each of the US entries, "one of Twitter's servers using IP 199.16.156.126 identifying itself as Twitterbot/1.0 had sent a GET request to the URL."
This means, he claims, that Twitter not merely reads its users' private messages, it monitors any links contained and 'gets' a copy of the page concerned.
Earlier this year it was discovered that Microsoft is doing something similar with Skype messages. Microsoft claimed at the time that it was using an automated scanning process to help locate spam websites. Since Microsoft is fully engaged in anti-malware, and uses its own spam database to protect Internet Explorer users, there is certain credence to these claims. Twitter, however, has no such defense.
"If they wanted to check whether or not a URL is malicious", says Hanff, "they should use the many freely available databases designed explicitly for that purpose. It is both more cost effective - generates far less data traffic which they have to pay for; and more technically efficient." It is more likely, he believes, "that the business value of being able to use the URLs to extend the behavioural profiles they keep on their users for their advertising platform is the real reason for this policy... using a GET request on every single page (to grab a copy) allows them to scan the page for keywords for the behavioural profiles."
Hanff has asked Twitter to explain this practice. He has also "forwarded the evidence to Vice President Reding at the European Commission and will be filing a complaint with the Polish DPA next week."
He had filed a separate complaint with the EC earlier this summer over concerns that Twitter is able to track user clicks via its t.co URL shortener without user approval. The EC’s initial response said that t.co was optional. “This is factually incorrect,” replied Hanff in a new letter to Viviane Reding. “Whereas users can use other URL shortening services Twitter still shortens all URLs by default with their ’t.co’ service even if it is a URL pointing to another URL shortening service.”
At the same time, he explained his new discovery: “Twitter are in fact tracking every single link a user receives in tweets and Direct Messages – not only are these links monitored by Twitter, Twitter servers actively visit every single link a user clicks on even links which are sent by Direct Message which are by definition supposed to be private.”
At the time of writing this report he has heard from neither Twitter nor Viviane Reding's office over the new concerns. He told Infosecurity by email, "I could have it wrong but experience tells me I probably haven’t and the lack of response from Twitter doesn’t help to alleviate my concerns. I don’t think this is part of a Government surveillance program (however sexy Snowden stories might be at the moment) I think it is far more likely to be simply about money."