28 October 2014

I am very much looking forward to meeting Tim Berners-Lee in November at the EIF event on Magna Carta for the World Wide Web.

The Magna Carta is a 13th century English charter aimed at guaranteeing basic rights and freedoms. Tim Berners-Lee has been making the case for a Magna Carta for the Web in his TED talk recently.

For years now, I have been involved in Privacy discussions from a policy standpoint - and frankly, these discussions generate more heat than light. By that I mean, everyone has a view on Privacy - and those are expressed. But very few insights emerge. Most people can do little about privacy anyway unless they are a major web/social media player.

To make matters worse, peoples’ behaviour contradicts their concerns. Everyone is concerned about Privacy but everyone will merrily use their social media accounts to log on to other sites (thereby providing a ‘cookie trail’ of all their activity on the Web)!

But this time, if Tim Berners-Lee could repeat the success of the World Wide Web, it could make a difference.

So, what is Tim Berners Lee saying? 

Essentially, Tim Berners-Lee says people need to reclaim ownership of their data and start making use of their data for their own benefit

According to the Guardian and others who have quoted his recent talks: (emphasis are mine)

  • Users should own their own data and be free to merge it with other sets as and when it could provide them useful insight, he said.
  • “That data that [firms] have about you isn’t valuable to them as it is to you,”. “I have almost a year as worth of data from Moves [Facebook-owned location-tracking app]. I can see how my exercise has gone up and down.
  • “In general… if you put together all that data, from my wearable, my house, from other companies like the credit card company and the banks, from all the social networks, I can give my computer a good view of my life, and I can use that. That information is more valuable to me than it is to the cloud.”
  • [To public data providers]: “I’m not interested in your data; I’m interested in merging your data with other data. Your data will never be as exciting as what I can merge it with.
  • “We turn tracking around: …make tracking something that we do to the people who use our data.” That way, we would not have to completely lock down sensitive information such as our health data, so that if we’re in a car accident, the right person can still access important information – but only by notifying us that they have done so.
  • Such a reversal of norms could do great things for medical research, citing an example of a theoretical, wide-ranging study into the long-term effects of a popular medicine. “If you give [people] the ability to see how [data is] used and you ban its misuse then people are much more happy to open up to their data being used. Finding drugs, we need to be able to look at a massive amount of data.”
  • "I would like us to build a world in which I have control of my data. I can sell it to you and we can negotiate a price, but more importantly I will have legal ownership of all the data about me," 
  • "We will be able to write really neat applications that take data from all the different parts of my life, and my friends lives and my family’s lives, and really help me live life in a more healthy way."
  • In the future, data will work in the same way that calendars work today. (Telegraph) Just as each person has their own personal calendar, and chooses to invite certain people to share events, people will have better control over what data they share with others. 
  • He added that data will be more valuable to the person to whom it pertains than any company or organisation that may want to exploit it for money. He pointed to smart home technology and wearable devices as examples of how people can harness their own data for personal benefit.
  • “Some people say privacy is dead – get over it. I don't agree with that. Privacy is very important,” he said. “People and companies function by having an information boundary, and the information within that boundary defines the group. That’s the way society works. The idea that privacy is dead is hopeless and sad. We have to build systems which allow privacy.” 
  • Berners-Lee said that as soon as people claim ownership of their data, they will be able to start making strategic decisions about where to use it. This will lay the groundwork for a whole range of transformative services.
  • For example, if the healthcare industry explained to people exactly how their medical data was being used and reassured them that tight security controls were in place around it, then they would much more willing to share their data for medical research. 

A Taxonomy of Privacy...

Let’s take a step back and try to understand privacy because we mean many things in context of Privacy

A University of Pennsylvania paper gives a good taxonomy of privacy which I summarize below (source from the paper):

The first group of activities that affect privacy involves information collection. Surveillance is the watching, listening to, or recording of an individual's activities. Interrogation consists of various forms of questioning or probing for information.

A second group of activities involves the way information is stored, manipulated, and used – what I refer to collectively as “information processing“. Aggregation involves the combination of various pieces of data about a person. Identification is linking information to particular individuals. Insecurity involves carelessness in protecting stored information from leaks and improper access. Secondary use is the use of information collected for one purpose for a different purpose without the data subject's consent. Exclusion concerns the failure to allow  the data subject to know about the data that others have about her and participate in its handling and use. These activities do not involve the gathering of data, since it has already been collected. Instead, these activities involve the way data is maintained and used.

The third group of activities involves the dissemination of information. Breach of confidentiality is breaking a promise to keep a person's information confidential. Disclosure involves the revelation of truthful information about a person that impacts the way others judge his/her character. Exposure involves revealing another's nudity, grief, or bodily functions. Increased accessibility is amplifying the accessibility of information. Blackmail is the threat of disclose personal information. Appropriation involves the use of the data subject's identity to serve the aims and interests of another. Distortion consists of the dissemination of false or misleading information about individuals. Information dissemination activities all involve the spreading or transfer of personal data or the threat to do so.

The fourth and final group of activities involves invasions into people's private affairs. Invasion, unlike the other groupings, need not involve personal information (although in numerous instances, it does). Intrusion concerns invasive acts that disturb one's tranquillity or solitude. Decisional interference involves the government's incursion into the data subject's decisions regarding her private affairs.

Why now? 

After the W3C, Tim Berners Lee has been speaking of the Semantic Web since 2001... In 2004(ish) the world got Web 2.0 – which could be seen as a ‘social’ semantic web (we provide the semantics). Tim Berners-Lee has also been involved with the Open Data Institute

In 2009 we saw ‘Big Data’. One could argue that the Privacy issues we see today have been amplified by Big Data. Thus, speaking about Big Data now could be a logical progression. The problem of Big Data and Privacy has now transitioned beyond the academics with the Snowden revelations and the ‘right to be forgotten’ regulation. So, in that sense Tim Berners-Lee looks to the 25th anniversary of the Web to start a discussion on internet rights – including Privacy. 

Both Governments and Companies have progressively lowered our expectations of Privacy through new Big Data technologies. As we have seen in the Taxonomy of Privacy – there are many ways Privacy could be breached and Companies and Governments now have exponentially powerful technologies to harvest Privacy related information. 

When viewed in this context, the efforts from Tim Berners-Lee could be seen as ‘Rethinking big data’ 

Ello is the first company in the social space to gain world-wide attention on the basis that it is ad-free. The fact that it is resonating with people to the extent it is, means there is a shift (irrespective of how it evolves beyond the initial Buzz – Ello Goodbye). By extension, Tim Berners-Lee could cause a much wider shift in perception for the community globally.

Rethinking Big Data - Rich Data and its implications...

But the real question is: if we rethink Big Data – what are we proposing instead?

This is where the current Magna Carta discussion gets more interesting (Telegraph)

  • “When people talk about big data you tend to think about size. That's interesting in a way, but the key thing is rich data. It's not that there's a lot of it, it's that there's all kinds of data,” said Sir Berners-Lee.
  • “Data comes in all kinds of shapes and sizes and one of the things we've got to do is build computers that are devastatingly powerful and can handle all these types of data.”

Big Data is being used by companies for targeted advertising and that makes Tim Berners-Lee feel ‘queasy’. But more interestingly, Tim Berners-Lee proposes to rethink Big Data by Rich Data

In a Forbes interview he elaborates more about Rich Data (again emphasis are mine):

  • “The idea that privacy is dead is hopelessly sad,” Berners-Lee said. “We have to build systems that allow for privacy… People have the right to see how their data is being used.” 
  • He hoped that this mechanism of providing access to one’s personal data would yield what he refers to as “rich data” rather than “big data.” The former would be of value to user and to others granted to use it, whereas the latter is a term he loathes
  • In general… if you put together all that data, from my wearable, my house, from other companies like the credit card company and the banks, from all the social networks, I can give my computer a good view of my life, and I can use that. That information is more valuable to me than it is to the cloud.”

Ironically, a month ago I blogged about a similar idea which I called 'Small data: A deterministic and a Predictive approach' (extending a talk by Daniel Villatoro, bank BBVA). Companies like Meeco are also addressing this challenge of empowering the User and are early attempts to provide the cross app data insight Tim Berners-Lee speaks of.

Conclusions – Could Rich Data be a viable alternative to Big Data?

So what is Rich Data? It’s Data (and Algorithms) that would empower the individual. According to Tim Berners-Lee: "If a computer collated data from your doctor, your credit card company, your smart home, your social networks, and so on, it could get a real overview of your life." Berners-Lee was visibly enthusiastic about the potential applications of that knowledge, from living more healthily to picking better Christmas presents for his nephews and nieces. This, he said, would be "rich data". (Motherboard

At the beginning of the article, I said that Privacy discussions often generate more heat than light... 

But with a talk of Rich Data – we could actually make a difference. More specifically, in my blog post about Small Datablog post about Small Data, I proposed a new class of Predictive algorithms. A similar approach could be considered for Rich Data also. If we succeed in doing this, we could both gain insights and empower/serve the individual.

I shall elaborate more about Rich Data after I meet Sir Tim Berners-Lee.

by Ajit Jaokar



  • Global Perspectives on AI Regulation: Navigating IP Landscapes & Connecting the Political Bubbles
  • 42:05 Risks of Internet Fragmentation
  • #EIFasks - MEP Beatrice Covassi on the impact of the Cyber Resilience Act

Related content