The Human Factor Is Essential to Eliminating Bias in Artificial Intelligence

It is not enough to open the ‘black box’ of machine learning. Direct human evaluation is the only way to ensure biases are not perpetuated through AI, argues Elizabeth Isele.

Expert comment Updated 8 July 2020 2 minute READ
The Watson robot is displayed at the IBM stand at a digital technology trade fair in Hanover, Germany. Photo: Sean Gallup/Getty Images.

The Watson robot is displayed at the IBM stand at a digital technology trade fair in Hanover, Germany. Photo: Sean Gallup/Getty Images.

More and more technology and digital services are built upon, and driven, by AI and machine learning. But as we are beginning to see, these programmes are starting to replicate the biases which are fed into them, notably biases around gender. It is therefore imperative that the machine learning process is managed from input to output – including data, algorithms, models, training, testing and predictions – to assure that this bias is not perpetuated.

Bahar Gholipour notes this bias as AI’s so-called ‘black box’ problem — our inability to see the inside of an algorithm and therefore understand how it arrives at a decision. He claims that ‘left unsolved, it can devastate our societies by ensuring that historical discrimination, which many have worked hard to leave behind, is hard-coded into our future.’

Technological expertise is not enough to scrutinize, monitor and safeguard each stage of the machine learning process. The experience and perspective of people of all ages and all walks of life is needed to identify both obvious and subliminal social and linguistic biases, and make recommendations for adjustments to build accuracy and trust. Even more important than having an opportunity to evaluate gender bias in the ‘black box’ is having the freedom to correct the biases discovered.

The first step is to open the ‘black box’. Users are increasingly demanding that AI be honest, fair, transparent, accountable and human-centric. But proprietary interests and security issues have too often precluded transparency. However, positive initiatives are now being developed to accelerate open-sourcing code and create transparency standards. AI Now, a nonprofit at New York University advocating for algorithmic fairness, has a simple principle worth following: ‘When it comes to services for people, if designers can’t explain an algorithm’s decision, you shouldn’t be able to use it.’

Now there are a number of public and private organizations who are beginning to take this seriously. Google AI has several projects to push the business world, and society, to consider the biases in AI, including GlassBox, Active Question Answering and its PAIR initiative (People + AI Research) which add manual restrictions to machine learning systems to make their outputs more accurate and understandable.

The US Defense Advanced Research Projects Agency is also funding a big effort called XAI (Explainable AI) to make systems controlled by artificial intelligence more accountable to their users.

Microsoft CEO Satya Nadella has also gone on the record defending the need for ‘algorithmic accountability’ so that humans can undo any unintended harm.

But laudable as these efforts are, opening the box and establishing regulations and policies to ensure transparency is of little value until you have a human agent examining what’s inside to evaluate if the data is fair and unbiased. Automated natural language processing alone cannot do it because language is historically biased – not just basic vocabulary, but associations between words, and relationships between words and images.

Semantics matter. Casey Miller and Kate Swift, two women who in 1980 wrote The Handbook of Nonsexist Writing – the first handbook of its kind – dedicated their lives to promoting gender equity in language. That was almost 40 years ago and, while technology has advanced exponentially in that time period, we’ve made little progress removing gender bias from our lexicon.

The challenge for AI is in programming a changing vocabulary into a binary numerical system. Human intervention is necessary to adjudicate the bias in the programmer, the context and the language itself. But gender bias is not just in the algorithms. It lies within the outcomes – predictions and recommendations – powered by the algorithms.

Common stereotypes are even being reinforced by AI’s virtual assistants: those tasked with addressing simple questions (e.g. Apple’s Siri and Amazon’s Alexa) have female voices while more sophisticated problem-solving bots (e.g. IBM’s Watson and Microsoft’s Einstein) have male voices.

Gender bias is further exacerbated by the paucity of women working in the field. AI Now’s 2017 report identifies the lack of women, and ethnic minorities, working in AI as a foundational problem that is most likely having a material impact on AI systems and shaping their effects in society.

Human agents must question each stage of the process, and every question requires the perspective of a diverse, cross-disciplinary team, representing both the public and private sectors and inclusive of race, gender, culture, education, age and socioeconomic status to audit and monitor the system and what it generates. They don’t need to know the answers – just how to ask the questions.

In some ways, 21st century machine learning needs to circle back to the ancient Socratic method of learning based on asking and answering questions to stimulate critical thinking, draw out ideas and challenge underlying presumptions. Developers should understand that this scrutiny and reformulation helps them clean identified biases from their training data, run ongoing simulations based on empirical evidence and fine tune their algorithms accordingly. This human audit would strengthen the reliability and accountability of AI and ultimately people’s trust in it.