You probably haven’t heard of human-centered evaluation of LLMs, and that needs to change. Human-centered work seeks to understand how real humans interact with technology, so that we can understand how humans (with all of their cognitive biases and quirks) interact with LLMs, and how these models affect individual human decision-making.
What are LLMs?
In the past year, Large Language Models (LLMs) have exploded in popularity—from research, to industry, to public awareness and accessibility. For instance, ChatGPT set historic records for its customer growth, with over 100 million users in its first two months. These models predict next tokens (character, word, or string) based on past context, generating free-form text for a variety of tasks in almost any specified style. This means that people can repeatedly integrate LLMs into their daily lives—to decide what to eat for breakfast, to write the responses to emails left unanswered from yesterday, to develop the sales pitch they have to present mid-morning, to generate a funny joke during a break from work, etc.
A variety of concerning issues of LLMs have already been identified, such as biased, toxic, or hallucinated outputs, but these largely only reflect distributional or instance-wise properties of their outputs. The potential ubiquity of this tool means that we need to consider how humans will actually interact with and use this new technology, while also acknowledging that we are all prone to cognitive biases and other quirks. This area of research is referred to as human-centered evaluation, and it has not yet been thoroughly explored for LLMs. Human-centered evaluation is, however, already popular in the Explainable AI (XAI) community.
What is Explainable AI?
Defining explainability for ML models is a subject of ongoing discussion. For the purposes of our discussion, we will focus on the most common type of model transparency seen in industry: post-hoc explanations of black box models. These often rely on only a trained model’s inputs and outputs to identify patterns in how models make decisions. These methods aim to unlock transparency in order to allow stakeholders to understand the decision-making process of models to improve trust and mitigate downstream harms. Arthur’s explainability features offer a variety of explanation options including counterfactual explanations (understanding how a model’s predictions would have changed given a hypothetical, what-if scenario) and popular methods such as LIME and SHAP.