What Is Ethical AI?
Ethical, or responsible, artificial intelligence (AI), “is the branch of the ethics of technology specific to artificially intelligent systems. It is sometimes divided into a concern with the moral behavior of humans as they design, make, use and treat artificially intelligent systems, and a concern with the behavior of machines, in machine ethics. It also includes the issue of a possible singularity due to superintelligent AI” (Wikipedia).
There are many things to address in this seemingly simple definition, raising both practical and profoundly philosophical questions. Whose ethics do we use? Who best represents and models these ethics? Should we expect artificial intelligence to “behave” better than we do as human beings? Do superior ethical standards in artificial intelligence put human beings at great risk of being surpassed by machines since we are immensely fallible?
The field of Ethical AI tries to address these questions even as we build real-world applications in the form of automated bots interacting directly with humans or silently running in the background, selecting suitable candidates for open job positions…or for potential incarceration. AI “choices” may have incredible impacts on individuals and larger groups.
I have written about ethics and bias in a previous blog, but I’m revisiting the topic in light of new articles and perspectives.
Sources of Bias
We all know “garbage in, garbage out” but it can just as easily be “biases in, biases out”. Artificial intelligence relies on machine learning and machine learning relies on quality data. “Machine learning systems are data-hungry. The more quality data they have, the more accurate they become” (TechTalks). The range of data is broad, but most of the data training machine learning is going to be machine readable in the form of text for the benefit of human end users.
For example, a common practice is to tag images with textual descriptions describing the objects and situations appearing in the image. A computer can’t determine what is a dog in a photo without first knowing what a dog is after having been trained on thousands of images of dogs labeled with the concept “dog” (or specific examples of dog breeds). Similarly, a more complex idea like “leaping dog” may require a different set of training data indicating the object and the action. If a computer misinterprets an image, in many cases this is trivial.
Think for a moment about tagging photos for a Facebook post: the engine does a pretty good job of identifying faces in your images because you’ve tagged yourself and your friends before. Additionally, the algorithm has profile and other photos for use as comparison. You’ve probably seen, however, misidentifications. Sometimes someone in your photo looks similar to someone else. Sometimes sunglasses, hats, or other features may confuse the imaging. Sometimes the machine sees a face in wood grain or some other grouping of features. We may find a misidentification amusing until someone with a dark complexion gets tagged as a gorilla (The Verge). Imagine such technologies used to analyze videos from crime scenes and unwittingly flagging you as the perpetrator. Imagine not getting a job for which you are qualified because you are female (Reuters).
It’s not often the case that biases have been weaponized within machine learning algorithms. Rather, unconscious biases creep in. For example, if more examples of “good” resumes are from male applicants, it won’t take long for the machine learning algorithm to pick up that male applicants are superior and throw out female applicants in the first round. Similarly, if you don’t train machine learning to see that there are shades of skin tone, it will compare dark-complected people to the most similar reference image information it has, such as the human-like face of an ape. The people responsible for training machine learning may or may not have their own biases, but they are human and may not be aware of what unconscious predilections they are feeding into the machine.
So, from a taxonomy and ontology perspective, what can be done to try to prevent biases from manifesting themselves in machine learning and artificial intelligence applications?
How do knowledge organization systems fit in?
Knowledge organization systems (KOS) include controlled vocabulary structures like taxonomies and ontologies. These vocabularies provide the centralized, single source of truth for term values, including the concept name labels, definitions, and other information. Since machine learning requires quality data applied to content—which, by nature, is typically messy and unstructured (or semi-structured)—the onus is on these controlled vocabularies to be an unbiased initial source.
It may seem implausible that concepts from a controlled vocabulary could be biased, but it’s possible there is bias in the concept labels themselves and, even more likely, that the relationships between these concepts create a graph of bias which is not apparent to human eyes but becomes obvious in application. Imagine problematic concepts like “Third World” or “developing nations” linked directly to country names by a hasCountry relationship. While the concepts themselves may be a cause for a concern, making the assessment of which countries belong to these categories is even more problematic and not a decision a vocabulary designer should necessarily be making.
The problem needs to be addressed at the source, but also needs to be addressed at the point of content creation and tagging. It is not just the use of single concepts which may lead to inaccurate training data, but concepts in relation to each other and in combination. If many photos of sports activities are tagged with “men” and “sports” or specific sports names, it follows that machine learning will draw the conclusion that only men are associated with sports. In a situation in which personalized ads are surfaced for consumers based on their interests, men will get images with sports but women may not. The intent was never to ignore this audience, but the tagging drives the resulting application.
What can we do?
“At the very least, companies should begin developing an AI ethics strategy that is documented and available for your viewing. In practice, this means one, that they’re providing you with a diverse crowd that includes various ages, races, geographies, languages, and gender to mitigate bias. The crowd should reflect the values you want incorporated into your data. Your data partner should also be committed to fair pay and fair treatment of their crowd workers, an often overlooked component of building ethical AI” (Appen).
Since we are addressing this from the point of view of knowledge organization systems, then we can take the above advice and realize that we must make the taxonomists and ontologists aware of the possibility of introducing bias into controlled vocabularies by carefully considering the concepts they add and the relationships created. Additionally, taxonomy governance committees should review controlled vocabularies for potential bias, and these viewpoints should ideally come from individuals with a diversity of perspectives.
Likewise, the applications that a KOS feeds should be reviewed for accuracy. Are tags on content creating false relationships by being used in conjunction? Are the machine learning algorithms applying inaccurate or incomplete training data to new situations appropriately? All of the consuming downstream systems may take on a new bias of their own based on the channel and context of presentation (YahooLife), so it is incumbent on the creators of consuming applications to interface with the creators of upstream data sources.
Like controlled vocabulary governance, the review for bias is never complete. “If we get past the budget and time issues, lack of strong governance can cause Responsible AI principles not to be met. Responsible AI isn’t just a one-time exercise, rather it’s a stream or discipline with continuous effort and dedication” (CMS WiRE). Those who create the KOS and those who review it and govern it must maintain a program with clear principles and actions.
Organizations must also look outside themselves in order to understand their industry and the way they fit in other contexts, including current laws. “We will likely need new laws to regulate artificial intelligence, and some lawmakers are catching up on the issue. There’s a bill that would force companies to check their AI systems for bias through the Federal Trade Commission (FTC)” (Vox). From a taxonomist perspective, this requires additional research beyond checking concept labels, their usage, and their definitions. Again, this is particularly important in creating graphs of relationships which may create new meanings of their own.
As information systems increase in complexity, so to do the considerations for building knowledge organization systems which power multiple systems within an information architecture. It’s up to us as practitioners to understand our role in creating ethical artificial intelligence.