Skin cancer detection apps disappoint in rare lesion recognition

A British study showed that both a direct-to-consumer app and an app used for scientific purposes failed to recognise rare skin cancers.

Are existing app models valid in the diagnosis of rare skin cancers?

Machine-learning models for skin cancer recognition have reported comparable or superior performance to dermatologists in controlled settings1. Consumer apps promising to detect skin cancer have become increasingly common. However, are existing models also valid in the diagnosis of rare skin cancers? Dr Lloyd Steel (Queen Mary University of London, UK) and his colleagues wanted to assess the ability of the apps to recognise Merkel cell carcinoma (MCC) and amelanotic melanoma and whether the app is capable of identifying common benign lesions like seborrhoeic keratosis and haemangiomas.

Thus, they created a dataset of 116 images of these rare cancers and benign lesions and assessed these images with 2 machine-learning models. The first model studied was a certified medical device, directly sold to the public and advertised as “being able to diagnose 95% of skin cancers” (Model 1). The second model was available for research purposes only and used as a reference (Model 2).

Machine learning algorithms are not fit for purpose when deployed in a real-world setting

Model 1 incorrectly classified 5 of 28 (17.8%) MCCs and 8 of 35 (22.9%) amelanotic melanomas as low-risk. Whereas, 62.2% of seborrhoeic keratosis and haemangioma were classified as high risk. For detecting malignancy, Model 1’s sensitivity was 79.4% [95% CI 69.3–89.4] and specificity was 37.7% [95% CI 24.7–50.8]. The results for model 2 were even worse: MCC was not included in the top 5 diagnoses for any of the 28 MCC images analysed, raising the possibility that the model had not been trained to recognise MCC.

The results pose a bigger question regarding the safety of other artificial intelligence models for detecting skin cancer available on the market. While ignoring or excluding rare skin cancers is a convenient strategy for in silico machine learning validation studies, it means machine learning algorithms are not fit for purpose when deployed in a real-world setting.

“In order to improve, machine-learning model evaluations should consider the spectrum of diseases that will be seen in practice. At the moment, most of the performance of those models is driven by the imaging data available, which is particularly scarce when it comes to rare skin cancers,” Dr Steele commented. A global collaboration between research groups and hospitals could be a step towards tackling the gap of skin cancer imaging data, which is a crucial element for a high-performance rate of machine learning. 

References
  1. Tschandl P, et al. JAMA Dermatol 2019;155:58–65.
  2. Lloyd Steele, et a.l Do AI models recognise rare, aggressive skin cancers? An assessment of a direct-to-consumer app in the diagnosis of Merkel cell carcinoma and amelanotic melanoma. P0604, EADV Congress 2021, 29 Sept–2 Oct.