BLOG

Blog

by Graham Head 26 Jul, 2021
My last post argues that an awful lot of analytics, not only those produced by machine learning, require human interpretation. This is hardly a new observation, of course. A while ago Johns Hopkins PHA tweeted a quotation from Jonathan Weiner , the cofounding director of the Johns Hopkins Center for Population Health IT, which arose during a discussion on data stewardship: “I have never seen computer analytics do a better job than a combination of humans and computers," said Weiner. "As we develop the data, the human interactions, the human interfaces, are just as important – if not more – than advanced computer science techniques." This combination of human and algorithm might seem at first to create a ‘cyborg analytics’ – not fully human, with an underlying burnished glint of metal, but close enough that its products have that whiff of the alien that places them firmly in the ‘uncanny valley’. That might actually be more off putting than the HAL-like blank-faced computer that makes decision that remain ultimately unexplained and unexplainable. However, Weiner’s combination of humans and computers might be better thought of as a partnership, with each half of the relationship bringing something unique to the whole. And for that partnership to work, there must be an explanatory element in each and every algorithm – something which goes a measurable way to providing understanding about how any given result was arrived at. This is the core aspect of the ‘Wolves and Huskies’ example I discussed last time. But what is it that the human analyst brings to the partnership that is so necessary? One answer might be something akin to ‘tacit knowledge’. I first came across this notion many years ago, at a retirement party for a field archaeologist, who had spent much of her working life with the British Museum. She explained how, when assessing certain types of finds, the trainee slowly acquired a sense of how to interpret them. The accumulated layers of expertise came about through exposure to many different objects, over time, and learning also from others’ views about them. She was profoundly dismissive of certain types of analytical techniques – ‘those of the “bean counters’” – which were loved by accountants and their ilk. For her, the knowledge needed by an expert professional in the field was something that was slightly evanescent, and which could only be developed through years of deep experience. In one sense this isn’t quite right – or at least this binary division between the bean counters and the field workers leaves a lot out. I know of several senior accountants who speak of when the books they are auditing just don’t ‘smell’ right. There is something about them which they can’t explain, but which forces them to look deeper to see where the discrepancies are, and what dark deeds against the Gods of fiscal rectitude are hidden. At the same time, field archaeology has been revolutionised by a whole suite of tools which make on-site accurate measurements far easier to take, improving stratigraphy, dating and assessment of provenance immeasurably. At the same time, this notion that one learns through doing – that an understanding of archaeological finds grows as one is exposed to more and more examples, and hears what other experts say about them – sounds suspiciously like many machine learning methods. Thus, expert radiographers, who have built their understanding of what an X-Ray is really showing through years of hard-won experience, can now be equalled, in part, by carefully trained algorithms looking at the self-same images. And there is no problem in recruitment when it comes to the algorithms. Some of the claims about tacit knowledge might begin to seem like special pleading. But it isn’t quite as simple as that; the AI assessments still need verification – and there are hard cases that are undecidable by the algorithmic tools. So human judgement is required, and that judgement is only arrived at through those years of experience. Perhaps a better way to think about this is ‘domain knowledge’. Going back to our example of Wolves and Huskies, the human test subjects who were spotting how the computer had made its mistakes were accessing a wider, associated web of knowledge that included an understanding of weather, colour, and the habits of huskies and wolves. This wasn’t simply limited to the training data captured in the photographs. The human partner in Weiner’s unCyborgian combined analytics brings that wider knowledge of connections between many, many other facts. One final thought: this has a direct bearing on recruitment. When building a team to build an AI tool, when solving a real-world problem, the domain experts must be seen as at least as important as the computer scientists who understand how the algorithms are best built, and how the tech is most efficiently exploited. Both are essential. And on both sides, there is a requirement to speak the others’ language, to some degree at least.
by Graham Head 26 Jul, 2021
There’s a well-known story that people tell when they want to argue that Artificial Intelligence is untrustworthy. It usually goes something like this: A group of AI researchers wanted to test their latest piece of software, so they trained it up on images of huskies and wolves to the point where it could distinguish one from the other. It got pretty good, to the point where it was getting it right all the time. However, it then got one wrong, and identified a husky as a wolf. When the computer scientists investigated why, it turned out that they had trained the system on pictures of wolves in snow, and on huskies in other terrain. Which meant that when they showed it a picture of a husky in a snowy field, it saw the snow and answered ‘wolf’ because all the other pictures with snow had been wolves. In fact, they’d built an AI tool that detected snow! This is often cited as an example of how the data fed to a machine learning system can result in systematic bias. In this case, you might say that because of an unnoticed correlation in the sample data, the resulting algorithm was biased against identifying huskies in the snow as huskies. Such mistakes, are a real and dangerous issue. They may, for example, serve to replicate real-world prejudices. For example, using training data that features only female infant school teachers will train the system to conclude that all such teachers are female. More worryingly, if current processes for assessing bank loan applications are unfairly prejudiced against black and other minority applicants, then using historic loans data without modification could cause a new AI solution to replicate those biases, unfairly rejecting good candidates. The Information Commissioner has focused on this issue in recent Guidance on Artificial Intelligence and Data Protection . This is a thorough and helpful piece of work, and I recommend it fully. Unfortunately, although this is a real and very important issue, it isn’t what the husky and wolf story is really about at all. It’s become misinterpreted in the telling, and in some ways the original is much more interesting. In fact, the researchers knowingly fed the machine learning tool biased data – and it wasn’t a new tool. They wanted to have a system that threw up errors. Because their research was about how users of such systems reacted when those errors occurred, and what strategies could be used to create confidence in AI systems. In this research, Marco Tulio Ribeiro, Sameer Singh and Carlos Guestrin, argue that: Despite widespread adoption, machine learning models remain mostly black boxes. Understanding the reasons behind predictions is, however, quite important in assessing trust, which is fundamental if one plans to take action based on a prediction, or when choosing whether to deploy a new model. Such understanding also provides insights into the model, which can be used to transform an untrustworthy model or prediction into a trustworthy one. In other words, the research was about human reactions to AI, and how trust in AI might be earned. For without trust, machine learning won’t be successful. In the 2016 paper, ‘“Why Should I Trust You?” Explaining the Predictions of Any Classifier’ , they go on to propose a new trust model of their own, and suggest techniques for building trust in predictive models through explanatory techniques. They give examples from text (eg random forests) and image classification (e.g. neural networks) – and it is the latter which throws up the example of the huskies and wolves. They trained the system on hand selected images of huskies and wolves, to intentionally create the false correlation of wolves with snow, and then allowed a test group of subjects to use the system. The research looked at how explanatory material on what the system was doing affected the test subjects’ confidence in the system when false results were found. This is important. There is a real risk in deploying and using AI systems without also providing some insights into how they are reaching the conclusions they do. Mistakes will occur, but by building an understanding into deployments, trust can be built, users can be more confident of the outcomes, and they can also, hopefully, spot erroneous results and guard against following the machine too-slavishly.
Share by: