The Dot Podcast

S1 E4: Two heads for one decision

Episode Summary

How using data algorithms can steer our decisions in a better direction - albeit not without a hidden cost - with Miikka Ermes, Lead Data Scientist from Tieto.

Episode Notes

Judges, doctors, C-level managers. What unites them is not only their direct impact on the lives of the individuals but also the multitude of viewpoints involved in making a single and fair decision.

Toss away your piles of papers & post-its. Data algorithms come to the rescue to assist in a sophisticated shared decision-making process shared between a human and a machine.

Listen in how Miikka Ermes, Lead Data Scientist from Tieto, develops assistive algorithms, how such algorithms can be created ethically and sustainably. Not everything in this tech is as trivial as it might seem.

Episode Transcription

Data technologies have penetrated our everyday life so seamlessly over the past decade. We see doctors finally utilising the large amounts of big health data collected over the years to decide on the best course of individual treatment for patients. The teachers can assess student's work effectively with plagiarism-detection tools. The emergency call-centres use modern algorithms to decide whether a call represents an emergency case based on neighbourhood overview & that of the reported family. The scope of data application is absurdly big!

Data technologies help us make better, much more weighted decisions than we'd make on our own. Instead of doctors gathering in a room to discuss the course of actions - each bringing input from their own experience - today a doctor can consult a data algorithm that will assess risks and success rate based on thousands of patients. But how much of the final decision is attributed to the algorithm's result, and how much - to the human decision-maker?

And more importantly, how can we ensure that the final decision is ethical and fair?

My name is Anastasiia Kozina, and you're listening to The Dot, a podcast about building a sustainable mindset in design and tech. Let's cross your Ts and dot the Is on how we can make better decisions with data technologies

I remember being a teenager, my parents had a subscription to this popular science magazine and in the 90s they all these genetic things, how to find out things about our health from the genes.

All that stuff was coming up in the 90s and I was reading those things, and I wanted to, I think, at some point, that was the thing I wanted to do.

This is a voice of Doctor Miikka Ermes, a Lead Data Scientist from Tieto that dedicated his career to building healthcare AI

It might've been just a single article, a couple of articles about something that I found fascinating as a teenager. And then something like that might guide your whole career, which is like, it's pretty crazy. If I hadn't read the article, I might be doing something completely else.

The term itself, the data science, it's not a too old term, but essentially I was trained to do data science. It was called other things in the past. It was called "signal processing", "oil image processing", "data processing", that kind of stuff, or "statistics" or "math".

Then later on came to term data science, just to preach these different, different topics and different expertise. So I've been doing data science from the beginning.

Data technologies tremendously digitalise our world and automate our everyday activities, making space to focus on more important things. With meticulously designed algorithms, we can change the course of the whole industry. For example, the movement of digital therapeutics uses algorithms to simulate how cells would respond to a potential treatment. This is huge, because drug development can sometimes take up to 15 years, and it’s incredibly hard to find eligible study patients to proceed with in vitro experiments. With use of wearables, patients feel more engaged in their care and trials and enjoy remote tracking. In fact, according to McKinsey & Company’s study, 75% of all patients expect to use digital services in the future and receive personalised care.

We had the idea that people lead unhealthy lives because they don't have enough data. Once we give them more information about their behaviors, for example, about their physical exercise and level of physical activity - just being able to generate more data, more information about that - would facilitate people changing their behaviours.

Tthat's like very engineer approach: more data will automatically lead to the better behaviours. It's simple. And I think it was like that was one of the something that affected me, like my career as well a lot. Along the way we found out that there is a lot of psychology involved as well.

It's not that straightforward. It's not such the data to affect decision-making, affect behaviors. You also have to take into account the psychological side of it and merge that with the with the technologies.

I think there is something similar going on here with this, so to say, sustainability data, or ecological impact, or environmental impact data that had happened with this variable personal health data I described earlier, that we are currently in the phase where we expect people to change their behaviours once they'll get more data. Which is, I believe, that's not going to be the case. Again, there is going to be this psychological factor involved.

It's not going to be just the data that's going to facilitate the change.

Data is also used to enable early detection and prevention of emergencies with the use of biomarkers. This non-intrusive method already helps people with epilepsy to avoid seizures altogether, at a time when coming to the attention of clinicians is impossible. It’s a great development in healthcare to aid chronic conditions altogether.

So, data helps us negate risks, which is extremely useful in healthcare, for example, hence all the previous examples. Both simulation and prevention algorithms can find its way to any industry though, from designing fire exit routes in architecture projects to play delay reduction in streaming services such as Netflix.

Data algorithms are also used to predict and forecast outcomes. As us, humans, are tremendously bad at thinking long-term and look at problems in its complexity, we trust in the systems to assist us in making life-changing decisions. In the depths of this assistance lives a risk that we can’t yet find a way to tackle.

In USA, shared-decision making became a common practice in the justice system. Studies show that judges are generally lenient to be more likely to grant parole in the morning or after lunch, when body glucose levels go up, making decisions unreliable. So a COMPAS data system was introduced to guide their thinking.

COMPAS is an algorithm that predicts how likely a criminal in question is to commit another crime within 2 years of release. As COMPAS was studied closer, though, an alarming statistic came up: black people are being heavily discriminated by the algorithm. It means that the algorithm showed signs of racial bias.

Across popular media, we are told that biases crawl into data algorithms from a developing team. This usually implies that the team is not diverse or inclusive enough, and hence unable to look at data parameters broadly. However, it couldn't be further from the truth.

COMPAS algorithm was built with historical parameters at its core. It took into account arrest data, among other things. Parameters such as person’s race wasn’t even included. Now, across the USA, black people are frequently being arrested on false grounds due to pervasive racial views across the country. So COMPAS didn't discriminate black people. And neither did the team that developed the algorithm. It was the society that was unfair in the first place. The algorithm merely became a product of its environment.

Similar patterns can be observed across numerous cases and domains. It's curious how much the complex systems in which we operate define the outcome of our work, and how little we often understand about how these systems work.

What happened to COMPAS system is known as “predictive parity”, which is a way for the algorithm to deal with an uncertainty, sometimes leading to discrimination of some social groups by error. So, even though the algorithm was given no prejudice and was designed with a fair mindset, the errors happen. To much surprise, they can be entirely avoided with a few simple principles.

If we are to keep using emerging technologies to aid us in our decisions, we need to set ground rules.

Rule #1: Transparency. When the unfairness lurks and stirs in the algorithms, the only way to detect and fix it is having information about the algorithm available for the public. In US, New York City Council passed a bill to have algorithm information open so it could be audited for any systemic biases other errors, which are still quite common in this tech. Kate Crawford from AI Now Institute that studies social implications of AI calls for breaking the loops that are not open for algorithmic auditing, for review, or for public debate.

Transparency is a great strategic element for data-powered organisations, too, as it defines whether organisation is sustainable and accountable, usually associated with companies worth trusting.

Finland in that sense is a pretty nice place to do research, for example, people trust authorities. I think that's one of quite unique things here. For example, in health care people, if you try to recruit people to studies where you basically explain them that we need your data, your sensitive health data, so that we can develop something new that will help future patients, people trust.

People trust you. People trust the healthcare organisations. People trust other authorities and they are willing to because of that trust. When you tell them that we will, we will treat your information securely and keep it private.

I think that takes long. It's quite easy to destroy that trust. It's a really sensitive thing, but I think that's quite unique.

Rule #2: Feedback loops. We need them badly. Anything built in isolation is much more likely to bring undesirable side-effects. To iterate data systems profoundly and get them to the level at which they can assist us better, it’s ever more important that creator and user work in a symbiosis, together.

We need to know how much of the user’s final decision is attributed to algorithm’s result. If there’s a disagreement with algorithm’s result, there must be a channel to appeal the decision. The feedback funnel needs to be open and engaging, because without it all we can do is keep guessing where the problem lies, putting down the wrong fires.

Is this data safe? How do we keep it safe? What kind of risks there are, how could someone, is there a way to, for someone to get access to this if we transfer it like this or something.

The risks are, they are always there. Something that we need to constantly, think about and also work on improve any security updates or anything, we always instantly have to install all our updates.

And also, do this kind of, how's it called, a type hacking when we're trying to break our own systems to make sure that no one else is able to do it.

Rule #3: Literacy first. If we know anything today, it’s that data-powered technologies require an overview of wider context.

I think that it's both a major benefit as well as risk that digital data can be shared easily. If you compare to the old good times of having patient's reports on paper, they were never in the right place and it was extremely difficult to share information or exchange or transfer information to another place. The data was always physically in one single place. Which made it really, really tricky to actually leverage it.

So now it's electronic and it's easy. I mean, it is available at the point of care that's essential for patients. At the same time, it can be also in a harmful ways. It's easy to send out, or distribute, or share with some other parties who don't have a legal reason or any good reason to access the data.

In some way we should be able to think philosophically, asking ourselves fundamental questions such as what is fair and how we can design an ethical product. Some challenges that data-driven companies face today are unfamiliar and new. That’s why, we need to broaden our mindset and invite more stakeholders to the process.

It's about networking. It's about finding the right people and trying to interact with them, try to pick their brains to figure out what, how to do things. What's the state of the art? What's the best ways to do these things?

We are currently, we are basically a software development team. Working in health care, we don't claim that we are the experts in healthcare. That's why we work with the clinicians and with the leading expert in health care. I think that cooperation is essential and the networks and understanding how to find the best possible answers.

We shouldn’t shy away from including researches & scientists to our product development, as they can help us navigate through social complexity. We shouldn’t avoid having designers on board, as they can bring new insight on necessary metrics and hypothesis as well as provide insight on how users intend to use an algorithm altogether. Another critical observation can be made around growth of companies and open-source solutions that provide audit for data systems. So there’re numerous ways to get additional insight and develop data-powered solutions, better.

If you think about all imaging, genotyping, all that stuff, it's creating massive amounts of data by bio-monitoring someone. If you spent a day in a hospital bed with all the monitors and all that data gets saved somewhere, it's a massive amount of data.

The thing is that it's not really possible for a human expert to leverage that amount of data.

It's hard to develop algorithms that could leverage the data that would somehow mimic the way the human expert would work because there is no such way. So it means that it's actually what we need to develop. It's something that's a bit disruptive in a way, and something completely novel.

"We can't solve problems using the same type of thinking we used when we created them". These wise words came from Albert Einstein, and they seem more relevant than ever. As we understand the value of data and services that use data more and more, we need to adapt our mindset to arising challenges. Data governance is one way to help with the sustainable transition: the more organisations use data as a strategic asset, the more data will be validated and used correctly, and the likelier organisations will be able to adapt to new data policies and restrictions, such as GDPR.

I guess the basic principle of telling that you need to tell when you collect data, personal data, you need to tell - what are the purposes? Why do you collect that? Why do you save it? People need to know about that. I think that's fair.

And I don't think it, like when you openly disclose what you are going to do with the data, I don't really see GDPR really restricting what you can do. Of course, it creates responsibilities, several responsibilities, how to deal with the data. But I see it just as a positive pro and clarification.

Probably for some organizations or stakeholders who have been previously flexible with how they use the personal data that they have collected - of course, they have to do more adaptation to this new environment, GDPR environment.

But I think that's all good, especially from the citizen point of view. I think that's really good progress.