When Big Data says “Happy Christmas”, what is the sentiment?
I always say “Happy Christmas,” however, this year as I write my chosen Christmas messages, I am forced to consider what someone else’s algorithm will imply about me, based on my use of digital words.
I want to explore in this ViewPoint, through the use of a “Happy Christmas” message, the level of TRUST already granted to something we cannot touch in a digital world.
Scene setting – Trust and Sentiment
Let’s consider the word happy and what it could imply. If we think about it, we know that taking the use of the word ‘happy’ out of context from Happy Christmas, we could imply wrongly that from its current abundance of use that everyone is now more happy. It would not only be misleading but could lead to personalisation errors later. The same principle applies for the word ‘merry’, it would be wrong to assume that the current use of it means that we have all drunk more. This simplistic view does demonstrate how such simple words can create complex data sentiment analysis problems.
Just to stretch the thinking further, let’s consider the ethics of the person who wrote the computer program (code) on the device you are using to view this or the algorithm behind your favourite search index. Not only can we easily misunderstand the words you use and take them out of context, but applying the analysis to determine or suggest something about you can be flawed either because the algorithm is flawed or the person who writes the code may have a different outlook or culture. Therefore, just imagine how much we need to TRUST someone who is trying to provide a SENTIMENT analysis based on what you have written without the context of human signals and other environmental data.
What does data really tell me?
The honest and truthful answer is not much, but I like to pretend that only the smallest snipped of data can tell me everything and with some insightful tools, code and algorithms I can predict what you will think next. If you think about your DNA as a tiny snipped of data, it can indicate many things about the physical you but it will never tell me what you are doing right now, who your friends are, what dreams your have or what you will eat tomorrow. An important question is what can I really extract or imply from data or your digital footprint. From this we need to determine what crosses the creepy line and whose culture and ethics are we working to?
What data can I collect from your Christmas time digital interactions?
I can collect your words (verbal and written), who you send messages to, who responds, what time you sent and responded, how often, the location, time to prepare messages, web sites visited, clicks, links, data volumes, who influences you, TV viewing, music listened to, which device …. in reality everything you do in a digital world I can gather/ harvest/ collect or be given. It is not easy but I can do it.
Given that this ViewPoint is exploring TRUST from the stance of data analysis and the ability to derive your sentiment or intent from your data, then knowing that gathering data is possible where next. To be clear when I use sentiment I am seeking to understand and present your emotion; what you really mean (meant) or what you wanted to infer (imply) and what level of TRUST is assumed in my interruption (how close am I)
All of this only has value to you and me if I deliver a personal report after Christmas saying how many cards you send and received, from who and what the sentiment of what you said and what was said to you. Hence my interest in TRUST, do you think I got it right and if so, do you believe what I am saying about others sentiment towards you.
It now gets complex.
Let’s assume you have presented on some social network a faith or religious preference. Using this snippet of data (knowledge) and “Happy Christmas” what could I infer and at what point does a digital interpretation of your data become creepy and dangerous. Here is a scenario….
A Jewish Orthodox friend of mine responds to my “Happy Christmas” message. Does the algorithm that analyses my data say that I am not sensitive to someone else’s views or that they wishing me “Happy Christmas” back is undermining their belief. What happens when my friends post Christmas report finds its way to the chief Rabbi who now wants to know why he is wishing everyone “Happy Christmas.” Was my friend being sensitive to me, enjoying the warm wishes, happy to hear from me or something else. Would/ should/ can the analysis be different if my friend is a fellow Christian, progressive Jew, a Muslin, a Hindu or an atheist? Consider the same issues when I am wishing my friends a Happy Diwali or asking how Ramadan is going?
Writing an algorithm to understand human nature which takes into account our own experiences and personal history and considering others is not simple. The algorithm (even if it worked) is also likely to diverge from reality as we tend to deny the output (reality) if it is too close to being true. But….
Who wrote the algorithm and who wrote the code?
One aspect we get worried about is the collection and storage of data and we can see, touch and understand it. The range is very wide and includes those worried about CCTV and data from our mobile phones. I can easily gather data about your “Happy Christmas” messages. Some worry for you about PII (Personally Identifiable Information) where it is and how it is protection. Others get concerned about how anonymous data is and even a few about how I can re-construct data to identify you. We should all be very grateful that some great minds worry about these important issues and debate the impacts of your data. However, I am currently thinking about who is writing the algorithm and code that takes this data and creates value for you and someone else. Should we do this analysis, is your sentiment more private than your public views?
Valuable, intrusive, creepy or wrong.
Is sentiment analysis valuable, intrusive, creepy or wrong? Everyone will have a view and I am sure that we can segment the market and with data I can tell where you fit in the range. However, your view could be based on what you don’t want to face.
Imagine you are about to buy a present for you partner, and based on your location or the web site’s you used immediately before the one you make a purchase, I could determine a sentiment towards that person and they could have access to this analysis. Does understanding how you spent the time before an action help in a decision about your sentiment of care, love or affection?
Do you want Apple, Google, Samsung, your bank, your mobile operator, your loyalty card provider to know that you are single and at the office party your send a text that was fun at the time but the person who got it and their service providers know the sentiment due to the circumstance and that your credibility, influence or reputation has been increased or reduced? Now you can argue with me that this is not possible, invasive, removes all human dignity and that what you do is unique; so when you have read “Predictably Irrational” – let’s have that chat.
The Semantic Web
If the next phase of the web (Web 3.0, the intelligent web, the semantic web) is where the web knows what you want to do before you do there are some complexities we need to face when we wish someone a “Happy Christmas!” Even if we could ignore the global economic crisis we live in tricky digital times where we now have the data, but are we ready to understand how to use the data and accept it for what is it. When the web has an understanding, insight, view, opinion or knowledge about you, can we accept that it may tell us something we don’t want to face up to. Sentiment is more than a word or a phrase and is linked to what we do, when we did it, with whom, with thought and with time. One issue is the algorithm that takes this data and creates a view about sentiment – another is the bias/ culture/ views/ opinions/ motivations of the programmer/ data scientist or coder who you cannot do anything about but TRUST.
Politically we ask who polices the police, maybe it is time to ask how do we confirm our Trust is correctly placed in those building web services? The power is not with a regulator or in public or private law but in how we accept transparency and live with the fact we as human are all different but all the same. Much like DNA, data is all the same at one level (ones and zeros) but the bigger the data gets the more unique it becomes, just like us.
This issue is not about what is Private or Public but Rights
A New Year provides time to reflect and look forward. From my narrow view of digital identity, data, reputation, sentiment, devices and networking; I would say that 2011 was driven by privacy issues at many different levels. Going forward will, I believe, be a time when there becomes a much wider realisation and acceptance that private, privacy and public are not the debate but the issue is that no-one has control of my, your, our data and that we need to start thinking about rights; who grants them, who provides command and governance, who has access, how your data can be use and how digital citizens can get value from their data.
You cannot control it and your data is out there, however, should you have the rights to revoke your phone number from someone else’s phone book or should they be able to access your sentiment for the message just sent to you?
I wanted to explore in this ViewPoint, through the use of a “Happy Christmas” message, how much TRUST we have granted to something (the algorithm and coder) which we cannot touch in a digital world. I hope that you can see that there are those who worry about privacy of data, but is so many ways this is just the tip of the iceberg.
Here is your chance to vote on the integrity of my “Happy Christmas”. Do you believe in the sentiment of my “Happy Christmas” message – please vote here !