Lecture 1

Responsible

Before getting deep into the serious stuff, we need to dispell a useless, dangerous, misunderstanding. In this course we care about responsibility. We care about how a data scientist can commit to answer to an ethical request in order to tackle an issue. We look forward, because we have problems to solve.

Tony Jones. Statue of O'Connor and horse at C.Y. O'Connor beach. from Gnangarra on wikipedia

Tony Jones. Statue of O'Connor and horse at C.Y. O'Connor beach. by Gnangarra on wikipedia

Sometimes we are driven to think that ethics is about guilt. Guilt is a feeling we prove for something we have done or a thought we had. It’s an unpleasent feeling that most of us experience once or twice in their life (or, if you are like me, a lot of times). Yet, its unpleasentness it’s not the problem here. The real problem is that guilt looks back in the past, not forward in the future. As a feeling, it is intended to punish us for our past actions and thoughts. It is intended to identify who is culpable. Moreover, guilt works on individuals, not communities or societies. Many of our issues, requests, and commitment work also at the level of communities and societies. Therefore, guilt is not our game. Forget guilt1.

When we will assess and scrutiny behaviours and decisions, we are considering responsibility. Responsibility is not a feeling, it’s a possibility. It is the situation of being able to respond to a situation. It requires knowledge and understanding. You can be responsible without being guilty. It does not need to be your fault to be responsible. And you don’t carry responsibility alone: a community, a group, a society, our species, can be responsible.

If you want to dive into guilt vs. responsability, Timothy Morton’s “Being Ecological” (a short video introduction here ) and Donna Haraway’s “Staying with the Trouble” are great books highlighting the importance of the shift from guilt to responsability in these times.

We all live in information landscape

The why game, part 2

Remember Cambridge Analytica? Toward the end of lab1 we touched upon the ethical consequences of Cambridge Analytica business practices, and asked ourselves why we thought something acceptable or not. In the afternoon between the lab and the lecture we had the opportunity of gathering more information about what happened exactly, so that it would be easier to formulate a clear, succint ethical statement about the case. We did not bother, yet, to define exactly what an “ethical statement” is, nor precise what “acceptable” and “not acceptable” means. We do share an intuitive, although imperfect, implicit definition and that’s good for us so far (yes, it shows a lot I’m not a philosopher).

CA, the facts (?)

Different sources (including some from within CA itself) reported about what Cambridge Analytica had been up to. We must thank a thorough investigative journalism coordinated effort (see also here) for the risen attention on the case.

We quote part of the “Cambridge Analytica” wikipedia page.

CA would collect data on voters using sources such as demographics, consumer behaviour, internet activity, and other public and private sources. According to The Guardian, CA used psychological data derived from millions of Facebook users, largely without users' permission or knowledge [...] The company claimed to use "data enhancement and audience segmentation techniques" providing "psychographic analysis" for a "deeper knowledge of the target audience". The company uses the Big Five model of personality. Using what it calls "behavioral microtargeting" the company indicates that it can predict "needs" of subjects and how these needs may change over time. Services then can be individually targeted for the benefit of its clients from the political arena, governments, and companies providing "a better and more actionable view of their key audiences."

Based on this background, was CA a trustworthy data science company or not? In class we did not reach unanimity: this allowed to have a more lively discussion. Let’s keep in mind that we are trying to exercise our ethical thinking, so it’s perfectly ok to take different stances about this.

CA, so what?

Using the why game we tried to pin point what basic assumption where driving our conclusions. It took a while (“be patient”, remember) but we got down to keywords such as:

  • privacy: accessing data subjects (people, communities, societies, …) information involves consideration about subjects’ privacy (no matter what they used that information for, nor if they used the information at all).
  • marketing: producing information has an impact on people, communities, and society; what sort of information is produce is relevant for evaluating wether CA has behaved appropriately or not.
  • legality: somebody asked wether CA was broke some law or not.
  • propaganda: democratic processes are important and delicate, interfering with them (whether to support or hinder them) is relevant; so, evaluating CA impact on the electoral outcome is ethically important.

OpenAI, the facts (?)

OpenAI is a non-profit organization doing research in what they call “friendly AI”. The organization advocates for open collaboration in research and in machine learning development. They present themselves as working for the benefit of humanity, and they consider seriously topics such as the risks of the “singularity” (roughly, that hypothetical moment where computers will be more intelligent than humans).

Recently, they published (with a lot of media battage) the news that a Natural Language Processing model they developed has obtained incredible results. In particular, starting from a short prompt, the model can generate long paragraphs of text telling a story inspired by that prompt (inventing a lot of details, interviews, …). Interestingly, the company decided not to publish openly the model itself (they “erased” it). They did it, according to them, because the model posed a risk that was too high to accept. You can read here their motivations, and some critical commentary about that decision by Robert Munro (and by Anima Anandkumar and by Zachary C. Lipton).

We still are not in a position to dive into that conversation, but we can try at least to repeat the why game and elicit what assumptions (implicit or explicit) ground that decision and those critics.

With a little bit of digging, some interesting key word emerged:

  • fake news: the OpenAI software would allow ill intentioned people to produce fake information at scale (take note, scale will be an important concept later on) very easily.
  • scrutiny: not opening the model to the research community impedes the capacity of researchers to understand the model and counteract it (e.g., developing ways of discerning between text written by a human and text written by the software)

Please take your litter home

£2,500 maximum penalty for littering in the Brecon Beacons National Park. The Powys County Council sign is in a picnic and parking area, SE of the Storey Arms Centre.

£2,500 maximum penalty for littering in the Brecon Beacons National Park. The Powys County Council sign is in a picnic and parking area, SE of the Storey Arms Centre. By Jaggery

In both the CA and the OpenAI case, I am omitting a lot of interesting steps in our why digging, and dropping very interesting assumptions and points.

Here, though we will focus on only some of this keywords, as they highlight a particular aspect of the ethics dialogue about data science.

What do privacy, marketing, propaganda, fake news, scrutiny have in common?

We can answer by introducing the concept of Infosphere developed by Luciano Floridi. The infosphere is the environment that surrounds, wrap, penetrate us. As there is a an atmosphere, the thin layer of air that surrounds the Earth and allows us to breath (so far) supporting life, so there a “layer” of information that allows us to interact with anything else (comprehending ourselves) and supports society. This environment is made of information.

OpenAI’s and of Cambridge Analytica’s decision have an effect on the infosphere. Collecting or not some information2. (and how), using that information or not to profile a person or to produce some text, convey a personalised propaganda bit to one user or publishing an accurate (or decisely false) automaticly generated article on a media outlet, making accessible or not the details of a data science product: all of these decisions contribute in shaping the infosphere. All of these decisions are ethically relevant. All of them can be analysed, criticised, approved, disapproved, cheered or booed.

Assessing the ethical implications of a data science project (or product), we will take into account its interactions with the infosphere. What information does the project uses a resources? What information is targeted? What information is produced?

A diagram of the relationships between the infosphere and the ethical agent, from Luciano Floridi's The Ethics of Information: information as a product, information as a target, information as a resource.

A diagram of the relationships between the infosphere and the ethical agent, from Luciano Floridi's The Ethics of Information: information as a product, information as a target, information as a resource.

In the Cambridge Analytica case, for example, the concern about privacy is a concern mainly about the targeting of information: it regards the unauthorized access to some (personal) information; and that is the case independently from any possible use of that data (people feel violated in the same way they would feel violated if somebody accessed their house without consent). The concern about marketing, propaganda, and fake news is mainly a concern about information-as-a-product: we are concerned that our informational environment is polluted and may be even more polluted by “toxic narratives” and “garbage news” (the choice of the word is telling). The concern about scrutiny can be framed in the terms of information-as-a-resource, because it regards the availability of some information to a particular community and its accuracy3.

No decision in designing and carrying out a data science project can be completely “neutral”: in fact, every decision defines a particular configuration of the three information arrows (information-as-product, -as-target, -as-resource), and every decision change the infosphere. This is true also when we decide not to do something. For example, the decision of openAI not to publish their model is ethically relevant. It is different to know that something exist (or existed), but we don’t have access to it than not knowing at all that something ever existed. The decisions of OpenAI to first work on that project (a project that, if successful, would have had certain ethical consequences), then to widely publicise their outcomes on media outlets, and finally to deny access to their data science product are deeply different decisions than the one of not starting that project at all.

We can analyse the differences, and get a more detailed assessment of their ethical consequences, by considering how the ethical agents (openAI and the researchers working for openAI) interacted, at each step, with the infosphere: what information we and they got access to before and after, what information was targeted in the process, what information was produced (by openAI and the media outlets), what information could have been produced if they had released the results (by “good” and “ill” intentioned users, by the research community analysing the data science product, …).

Notes

1: Here, we are not trying to undermine the importance of establishing culpability (in the judicial system or in other reparation processes). Nor to put blame on feeling guilty. We are trying to put the focus on what we can do, starting now. 2: Importantly, openAI decided to collect the information (in jargon, to train the model) using Reddit conversations. Reddit is a specific online community (or ensemble of online communities), with specific cultural traits. Adrienne Massanari and many other researchers already pointed out the risks of confusing Reddit with a “neutral” dataset given the presence of pervasive “toxic technocultures” (such as anti-feminist and misogynistic activism). 3: No ethical issue can be described only by one arrow, and that is way we used the qualifier “mainly” in the text. Ethical problems in the wide wild world involve all of the three arrows at the same time.