DATA 471 The Trustworthy Datascientist

Data Science as carpentry According to philosopher Ian Bogost, carpentry is the activity of “making things that explain how things make their world.” You may want to re-read the notion a number of times, as beyond its simplicity there is quite a lot to unpack. Just a couple of hints. Making their world can be put in relation to the notion of ontological commitment that we discussed last week: the way in which interact the world makes the world, for us.
Oaths, principles, checklists A large variety of workers, in fields as diverse as psychology, engineerings, justice, medicine, journalism, … have developed oaths stating their commitment to a set of ethical principles. Also data scientists from all over the world are elaborating, discussing, and adopting such oaths. During the weekend, take a look at some examples: The Data 4 Democracy initiatives (a large, international, grass-root gathering of data scientists across industry and Academy) elaborated an ethical framework presented as a pledge (see also the manifesto).
On Monday we are going to start a discussion that may take a bit to survey. The big question is, dangerously, easy to pose: Who owns the data? Yet, it’s not an easy question to answer, because many different it involves different point of views and epistemological dimensions (that is, “ways of knowing the wold”). To start with, let’s listen this podcast episode by Walter Vannini: Listen to “DK_en 1x03 - The Mother Of All Datagrabs” on Spreaker.
TAF is tough, but is worth This week we are going to talk about fairness, and in particular the ethical request of being treated fairly algorithmically. We are going to consider a little handful of materials, or better a subset of them. Read the points about fairness here: http://www.fatml.org/resources/principles-for-accountable-algorithms Done? Let’s move to a completely different kind of source. Recently, Alexandra Ocasio-Cortez spoke about algorithms. It’s been widely reported by media, and discussed in depth (you can search it on the web for many commentaries).
Week 5 opens the segment about “requests“: what do data subjects (be they individual, communities, interest group, or any other informational organism in our infosphere) want? Agency Agency and autonomy are (complex) notions that speak about the right of an agent (an informational organism in the infosphere) to determine their own direction, to take decision freely, to self-govern. When thinking about these, we may want to keep in mind that informational organisms do not exist in isolation, but are part of large, convoluted, networks of interactions: what does it mean to have autonomy in, and as, a community?
Hands-on-data lab Opaque, transparent, and everything in between Anatomy of AI
Critical! No reading this week, but a good watching. We will be talking about recommender systems, homophily, assortativity, weapons of math destruction, and more, starting from this 40 minutes (plus intro and question time) talk by Prof. Wendy H. K. Chun: activity Prof. Chun identifies 4 key steps to overcome discriminating uses of data science. Summarise them with your own words. (< 4 lines) What is a recommender system?
Lab 3: bias Credit: Joy Buolamwini Logistics Room: This Monday 4th of March, the laboratory is going to be in Jack Erskine 442 and not in the usual Jack Erskin 244, as we are going to work on computers. Stencila The data science hands-on-data lab will be run on Stencila Hub. Stencila Hub is an online (and desktop) suite for reproducible research. All activities will be completely hosted online, so you don’t need to install anything if you don’t want to (but a recent web browser).
Week 2 Warning: I’m running late writing these notes; this is only a stub, thought for you to keep track of what we had done. I hope to complete them very soon. And, remember, you can help me: the source material is at www.gitlab.com/gvdr/data471/content/material The Digital Poor House. Or, Is Data Neutral? Lab 2: The actor-network account of the Allegheny Family Screening Tool debacle The poor house In the pre-lab activities for this week, I inveted you to read Virginia Eubanks article for Wired: it is a long excerpt from her book [“Automating Inequality”].
Digital poverty cages In the next discussion session we are going to talk about poverty cages: what are they? how are they built? what is the impact of their digitalization? what role do data science product play in their development? readings mandatory: A Child Abuse Prediction Model Fails Poor Families: by Virginia Eubanks if you want more: High-Tech Homelessness: by Virginia Eubanks. reflection question Sketch down your answer before the lab, and think about it after.