Data used to build algorithms detecting skin disease is too white
Public skin image datasets that are used to train algorithms to detect skin problems don’t include enough information about skin tone, according to a new analysis. And within the datasets where skin tone information is available, only a very small number of images are of darker skin — so algorithms built using these datasets might not be as accurate for people who aren’t white.
The study, published today in The Lancet Digital Health, examined 21 freely accessible datasets of images of skin conditions. Combined, they contained over 100,000 images. Just over 1,400 of those images had information attached about the ethnicity of the patient, and only 2,236 had information about skin color. This lack of data limits researchers’ ability to spot biases in algorithms trained on the images. And such algorithms could very well be biased: Of the images with skin tone information, only 11 were from patients with the darkest two categories on the Fitzpatrick scale, which classifies skin color. There were no images from patients with an African, Afro-Caribbean, or South Asian background.
The conclusions are similar to those from a study published in September, which also found that most datasets used for training dermatology algorithms don’t have information about ethnicity or skin tone. That study examined the data behind 70 studies that developed or tested algorithms and found that only seven described the skin types in the images used.
“What we see from the small number of papers that do report out skin tone distributions, is that those do show an underrepresentation of darker skin tones,” says Roxana Daneshjou, a clinical scholar in dermatology at Stanford University and author on the September paper. Her paper analyzed many of the same datasets as the new Lancet research and came to similar conclusions.
When images in a dataset are publicly available, researchers can go through and review what skin tones appear to be present. But that can be difficult, because photos may not exactly match what the skin tone looks like in real life. “The most ideal situation is that skin tone is noted at the time of the clinical visit,” Daneshjou says. Then, the image of that patient’s skin problem could be labeled before it goes into a database.
Without labels on images, researchers can’t check algorithms to see if they’re built using datasets with enough examples of people with different skin types.
It’s important to scrutinize these image sets because they’re often used to build algorithms that help doctors diagnose patients with skin conditions, some of which — like skin cancers — are more dangerous if they’re not caught early. If the algorithms have only been trained or tested on light skin, they won’t be as accurate for everyone else. “Research has shown that programs trained on images taken from people with lighter skin types only might not be as accurate for people with darker skin, and vice versa,” says David Wen, a co-author on the new paper and a researcher at the University of Oxford.
New images can always be added to public datasets, and researchers want to see more examples of conditions on darker skin. And improving the transparency and clarity of the datasets will help researchers track progress toward more diverse image sets that could lead to more equitable AI tools. “I would like to see more open data and more well-labeled data,” Daneshjou says.