The Path towards Trustworthy AI is no Tech but a Human Intelligence Test


By Hamilton Mann

In the frenzy to champion the potential of trustworthy AI, the recent moves from tech giants offer a reflective pause about one of the most, if not the most important aspects of AI, which, paradoxically, is seldom discussed: the challenge it poses to human intelligence. 

The quest for AI’s sensory perception   

With OpenAI’s ChatGPT flaunting sensory capabilities and Meta introducing AI chatbot personalities, the crescendo of AI’s role in our lives is unmistakable. 

While these advancements showcase the leaps AI has made, there’s a subtext here: these AI systems are mirroring complex human communication capabilities.   

It’s easy to get entangled in the glitz of AI’s capabilities and miss the fundamental question: should AI aim to mirror human faculties or should it charter a different course? 

As we analyse OpenAI and Meta’s innovations, the growing capability of AI to emulate human-like behaviour cannot be ignored. 

However, a closer look, underpinned by scientific evidence, unveils the intricate layers involved and prompts important inquiries about the direction AI should take. 

To begin with, the architecture of many AI models is inspired by human neural networks. For instance, the deep learning models use layers of interconnected nodes, reminiscent of how neurons are connected in the human brain. A research paper from Angela D. Friederici titled “The Brain Basis of Language Processing: From Structure to Function” published in the American Psychological Society Journal in 2011, indicates that when humans engage in complex communication, multiple regions of the brain, including Broca’s and Wernicke’s areas, work synchronously.   

Similarly, AI models, such as OpenAI’s GPT series, employ multiple layers to generate and interpret text, mimicking this orchestrated brain activity.   

When it comes to grasping semantics, while AI has made strides in producing human-like text, there’s a distinction between generating syntactically correct sentences and truly understanding semantics. The Neurocognition of Language, a book published in 2000 by Oxford University Press, highlighted that human brains process words and context in tandem, allowing for a deeper understanding of language nuances. AI, in contrast, relies heavily on patterns in data without truly grasping the underlying meaning. This distinction underscores the difference between superficial emulation and genuine comprehension.   

Diving into emotional intelligence, Meta’s advancements in AI highlight its ability to interpret and simulate human ones through facial recognition and text analysis. However, scientific studies, such as those by Antonio R. Damasio in his book Descartes’ Error published in 1994, emphasise the intrinsic link between emotions and human consciousness. While AI can recognise emotional cues, it doesn’t experience emotions in the human sense, indicating a fundamental disparity between recognition and experience. 

While AI can recognise emotional cues, it doesn’t experience emotions in the human sense, indicating a fundamental disparity between recognition and experience.

On the artistic spectrum, AI models, such as DALL·E by OpenAI, can generate creative images, but their “creativity” is constrained by patterns in their training data. The research paper “DALL·E: Creating Images from Text” published in the OpenAI Blog in 2021 highlighted that while AI can mimic certain creative processes, it lacks the intrinsic spontaneity and serendipity inherent in human creativity. Its creativity, unlike that of humans, isn’t influenced by a lifetime of diverse experiences, emotions, or moments of serendipity. Instead, it relies on vast quantities of data and learned patterns. 

Lastly, through the prism of ethical and philosophical lenses, the quest to replicate human faculties in AI brings forth ethical dilemmas. The Human Brain Project (HBP) funded by the European Union seeks to understand the intricacies of the human brain, potentially offering insights into creating more human-like AIs. But this brings up a philosophical question: Just because we can replicate certain aspects of human cognition in machines, does it mean we should?   

While evaluating AI’s character may seem akin to understanding human nature, it’s crucial to realise that AI doesn’t have personal experiences, emotions, or consciousness. Instead of anthropomorphising AI, we should aim to understand its unique nature. 

As we push for greater intelligence in machines, it becomes equally crucial to instill boundaries that guide this intelligence in a responsible manner.  

This won’t be and shouldn’t be made by the machine for itself. 

The complex AI guardrails equilibrium   

Leading voices emphasise the importance of guardrails to avoid AI’s pitfalls. Yet, historically, revolutionary technology faced similar trepidations. Cars, when first introduced, faced skepticism, with critics demanding speed-limiting devices to ensure safety. Imagine limiting vehicles to a pedestrian’s pace! In a bid to contain AI, are we stifling its potential? 

The introduction of electricity transformed homes and industries but also brought risks such as electrical fires and electrocution. As infrastructure and regulations evolved, safety improved without curbing the transformative power of electricity. The core principle here is adaptability. As society understands the potential dangers of a particular technology, guidelines can be adjusted to ensure safety without inhibiting innovation. 

A look back at technological milestones can offer instructive parallels. 

As society understands the potential dangers of a particular technology, guidelines can be adjusted to ensure safety without inhibiting innovation.

Historically, the aviation industry underwent multiple safety iterations before reaching today’s standards. Early planes faced numerous accidents, leading to skepticism about commercial flight. However, over time, rigorous testing, improved design, and advanced regulations have made flying one of the safest modes of transportation. Iterative improvement based on accumulated data and real-world experiences can refine both technology and its safety protocols. Rather than stifling potential, these refinements can bolster public trust and facilitate broader adoption. 

Similarly, the development of nuclear energy saw significant hesitancy, given the catastrophic potential of mishaps. However, meticulous regulations, safety protocols, and international pacts have allowed nations to harness nuclear power without widespread disasters. Properly calibrated regulations can serve dual purposes: ensuring public safety and providing a structured framework within which innovations can flourish. Overly strict regulations might stifle potential, but a complete lack can result in distrust and potential misuse.   

Conversely, the Internet’s rise was swift, catching many regulators unprepared. While it has democratised information, the lack of initial guardrails has led to issues such as cyberbullying, misinformation, and data privacy concerns, and those are still a primary concern today. The challenge has been retroactively implementing guidelines without curtailing the web’s intrinsic freedom.   

Rapidly evolving technologies can benefit from early, flexible guardrails that evolve in tandem with the technology. It ensures that as technology advances, its safety and ethical implications are addressed in real time, striking a balance between potential and precaution.   

While it’s valid to raise concerns about stifling the potential of AI with excessive guardrails, appropriately calibrated precautions can, in fact, bolster innovation by building trust and ensuring broad societal acceptance. 

Finding the right equilibrium is as essential as understanding the moral principles that shape these boundaries, giving AI its ethical foundation.   

Again, the machine won’t and shouldn’t autonomously generate this for itself. 

The AI moral compass 

Anthropic and Google DeepMind’s attempts to create AI constitutions—core principles guiding AI behaviour —are commendable. However, once the authority of certain final principles is established, other avenues of understanding are often dismissed. By framing AI’s potential within our current ethical constructs, we might inadvertently limit its vast potential. The creation of an AI constitution should be evolutionary, rather than prescriptive.   

From a historical perspective, Thomas S. Kuhn, in his influential book “The Structure of Scientific Revolutions” published by the University of Chicago Press in 1962, posited that science progresses through paradigms—widely accepted frameworks of understanding. However, once a paradigm takes hold, it often constrains alternative viewpoints and approaches. This can be applied to AI ethics: a too-rigid AI constitution might become the de facto paradigm, constraining alternative ethical approaches and potentially stifling innovation.   

Turning to economics, behavioural economists like Herbert A. Simon have argued that humans often make decisions based on “bounded rationality”, limited by the information they have, cognitive limitations, and the finite amount of time to make a decision. If AI is constrained strictly by our current bounded understanding of ethics, it may not explore potentially better solutions outside of these bounds.   

Delving into psychology, research from the field of moral psychology, such as Jonathan D. Haidt’s work on moral foundations theory suggests that human morality is complex, multidimensional, and varies across cultures. If we overly standardise an AI constitution, we may overlook or undermine this richness, leading to AI systems that don’t account for the vast tapestry of human values. 

Drawing from natural processes in evolutionary biology, nature’s diversification strategy ensures survival and adaptation. Species that were too specialised and inflexible often went extinct when conditions changed. Similarly, an AI that is too narrowly confined by a rigid set of principles may not adapt well to unforeseen challenges. 

Exploring genetic frontiers in the realm of bioethics, the introduction of Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) – a technology that research scientists use to selectively modify DNA, adapted for laboratory use from a naturally occurring defense mechanism in bacteria that allows them to recognise and destroy foreign DNA from viruses – has sparked debates about the limits of genetic modification. Some argue for restraint based on current ethical principles, while others believe there’s a need for evolving ethical guidelines as we learn more about the technology. This can serve as an analogy for AI: as we discover more about its capabilities and implications, the evolution of our guiding principles will be questioned in tandem.   

That said, with a clear moral foundation set for AI, we must then ensure that it truly represents everyone, emphasising the importance of inclusiveness.   

Yet again, the machine won’t and shouldn’t create this on its own. 

The path towards AI inclusiveness 

Reinforcement Learning by Human Feedback (RLHF) as a method to refine responses generated by AI has faced criticism for being primitive.   

But let’s examine this. 

If AI learns from human feedback, doesn’t it reflect our collective psyche? Instead of overhauling this method, diversifying the pool of evaluators might offer richer feedback, reflecting a tapestry of human perspectives.  

Critically, multiple studies have shown that AI models can inherit and amplify human biases, especially if they are trained on biased data. For example, “Semantics derived automatically from language corpora contain human-like biases”, a 2018 Princeton research study published in the journal Science demonstrated that commercial algorithms showed gender biases by associating male words with careers and female words with family. This suggests that if the human feedback in RLHF comes from a homogenous group, the resultant AI behaviour might also be skewed.   

Critically, multiple studies have shown that AI models can inherit and amplify human biases, especially if they are trained on biased data.

From a global standpoint, cross-cultural psychology has uncovered significant differences in how moral values are prioritised in different cultures. For instance, a study titled “Is It Good to Cooperate? Testing the Theory of Morality-as-Cooperation in 60 Societies” by Oliver S. Curry et al., published in the University of Chicago Press Journals in 2019, found that while certain moral values were universally recognised, their interpretation varied across cultures. Thus, a diverse pool of evaluators in RLHF can offer a more holistic view of what’s “right” or “acceptable”. 

On the neural front, neuroscientific research indicates that people from different backgrounds or with different neurological makeups process information differently. For example, studies have shown that bilingual individuals can process certain language tasks differently from monolinguals. One of the renowned experts in this field is Dr. Ellen Bialystok, who has conducted numerous studies on bilingualism and its effects on cognitive processes. For instance, Bialystok’s research study titled “Bilingualism: Consequences for Mind and Brain” published in Trends in Cognitive Sciences in 2012 has shown that bilinguals often outperform monolinguals in tasks that require attention, inhibition, and short-term memory, collectively termed “executive control”. Incorporating neurodiverse evaluators in RLHF can provide varied cognitive feedback, leading to a more robust AI model. 

From a slightly different angle but reaching a similar conclusion, James Surowiecki’s book, The Wisdom of Crowds, presents evidence that collective decisions made by a diverse group often lead to better outcomes than even the best individual decision. When applied to RLHF, this suggests that a diverse group of evaluators can provide more accurate and balanced feedback than a select few experts.   

Reflecting on past shortcomings, there have been instances where a lack of diversity in evaluators has led to unintended AI behaviour. For example, the racial and gender bias in certain facial recognition systems can be traced back to a lack of diversity in training data.   

Failures in using the RLHF method to ensure improved response of AI are not because of the method itself, but due to the lack of diversity. Ensuring a diverse pool for RLHF can help mitigate such pitfalls. 

Moving towards diversity is a key condition for developing AI’s inclusiveness. It’s an essential precursor for recognising and actively countering biases, ensuring AI’s consistent utility and fairness.  

Similarly here, this won’t and shouldn’t be self-managed by the machine. 

The battle against AI’s inherent biases   

Red-teaming, a process of “breaking” AI to understand its vulnerabilities, while robust, resembles the older software testing methods. By focusing on adversarial testing, we might be swayed by collective consensus rather than individual merit.   

While red-teaming aims to find vulnerabilities in AI systems by simulating adversarial attacks, the nature of these attacks often reflects known vulnerabilities. 

Moving towards diversity is a key condition for developing AI’s inclusiveness. It’s an essential precursor for recognising and actively countering biases, ensuring AI’s consistent utility and fairness.

“Towards Evaluating the Robustness of Neural Networks”, a research study by Nicholas Carlin and David A. Wagner from UC Berkeley published in arXiv in 2017, highlighted that adversarial examples (perturbed inputs designed to fool machine learning models) in one domain can be starkly different from another. By concentrating only on known issues, we might neglect emergent risks specific to the evolving nature of AI. 

In addition, traditional software red-teaming often focuses on a limited set of potential threats or vulnerabilities. However, the complexity of modern AI models, like deep neural networks, demands an extensive landscape of possible threats. A 2018 paper titled “Adversarial Risk and the Dangers of Evaluating Against Weak Attacks” published in arXiv by Jonathan Uesato et al., demonstrated that larger neural networks, while more accurate, are often more susceptible to adversarial attacks, implying a vast attack surface. 

Moreover, human biases can infiltrate the red-teaming process. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification” the study co-authored by Joy Buolamwini and Timnit Gebru and published in Proceedings of Machine Learning Research in 2018 highlighted that AI systems trained in one cultural context might exhibit vulnerabilities that are entirely overlooked by red-teamers from that same context, simply because their own biases blind them to potential risks. This underscores the need for a globally diverse team for comprehensive red-teaming. 

While it’s crucial to understand AI’s response in worst-case scenarios, an overemphasis can lead to neglect of more mundane but equally critical issues. A case in point is Microsoft’s Tay, an AI chatbot that began tweeting inappropriate content not due to an adversarial attack but because of the data it was exposed to. A strict red-teaming approach might miss such vulnerabilities that arise from the model’s regular interactions. 

AI models, especially those incorporating some form of online learning, evolve over time. A one-time red-teaming might not be enough. There is a need for continuous and dynamic testing methodologies tailored for AI, as models can drift from their initial behaviour due to continuous updates. 

Ultimately, addressing biases is an ongoing process, pushing the boundaries and goals of AI to continually adapt and evolve.  

As with before, it’s not and it shouldn’t be the machine’s design to craft this on its own. 

The AI’s evolving finish line 

We often perceive AI as a problem awaiting a solution, yet we must not forget the rich tapestry of human experiences. The goal for AI’s future should not solely be to forge an infallible model but to consider how we might embrace its inherent imperfections, just as we do with humanity.   

As we stand poised at the intersection of AI’s rapid advancement and its profound implications for humanity, our endeavour should be to co-create an AI ecosystem that mirrors the finest of human ideals, convictions, and hopes. In doing so, we must always remember that no machine can, or should, ever supplant human critical thought. 

About the Author

Hamilton Mann - Author

Hamilton Mann is the Group VP of Digital Marketing and Digital Transformation at Thales. He is also the President of the Digital Transformation Club of INSEAD Alumni Association France (IAAF), a mentor at the MIT Priscilla King Gray (PKG) Center, and Senior Lecturer at INSEAD, HEC and EDHEC Business School.


Please enter your comment!
Please enter your name here