Recently there has been some concerns about the privacy of the new feature we recently added to the dash in which it can query external resources to provide related results. I just wanted to follow up with some further details about how these searches are performed, the privacy protections that are put in place, and further work going on.
I reached out to John Lenton, who is the Senior Engineering Manager in the Online Services team at Canonical. He was responsible for building the technology that handles the searches from the dash. He says:
When performing a search, you expose no more information to Canonical than the originating IP of your request, the search terms you enter, and the result you click on (if any). We don’t perform any kind of “tracking”; there is nothing really user-identifyable there…the IP address is unreliable for this, and isn’t relied on other than for collapsing multiple searches into one in the reporting, and even this is after passing it through a one-way hash.
Searches are currently performed over plain HTTP to our servers in a data-centre in either London or the USA, and then forwarded to the upstream providers appropriate to the originating request’s geolocation. The only potentially identifying bit of information, the IP address of the originating request, is not forwarded unless explicitly required to perform the search (so far, only one of 20+ upstream providers requires this: the Headweb video source for scandinavian countries needs to do its own geoip).
We appreciate some of the community concerns about these searches operating unencrypted and we are currently working to encrypt these dash searches ready for the release of this feature in Ubuntu 12.10. This should resolve most of the concerns shared about unencrypted traffic.
In terms of logging, the raw
httpdlogs are only visible to a small group of people whose job requires that they have access and who are trained in respecting people’s privacy in accordance to European law on this matter. The searches themselves, stripped of the IP addresses (replacing them with a one-way hash) are made available to a slightly larger group of people to enable statistical reporting. Because not only the search but also clicking on a result reaches our server (where it is redirected to whatever is appropriate), we will be able to infer what search results people want when searching for particular terms, and at some point in the future this will be used to help us provide better, more relevant results. This statistical gathering of a mapping of search terms to clicked search results is not done yet but will be done soon”.
Please feel free to follow up with any further questions, and we will try to get them answered.