The elephant in the machine : proposing a new metric of data reliability and its application to a medical case to assess classification reliability

Published on Jun 1, 2020in Applied Sciences
· DOI :10.3390/APP10114014
Federico Cabitza22
Estimated H-index: 22
Andrea Campagner6
Estimated H-index: 6
+ 12 AuthorsLuca Maria Sconfienza37
Estimated H-index: 37
📖 Papers frequently viewed together
5 Authors (Zekun Song, ..., Di Qi)
#1Federico Cabitza (University of Milan)H-Index: 22
#2Andrea CampagnerH-Index: 6
Last. Clara Balsano (University of L'Aquila)H-Index: 33
view all 3 authors...
: Interest in the application of machine learning (ML) techniques to medicine is growing fast and wide because of their ability to endow decision support systems with so-called artificial intelligence, particularly in those medical disciplines that extensively rely on digital imaging. Nonetheless, achieving a pragmatic and ecological validation of medical AI systems in real-world settings is difficult, even when these systems exhibit very high accuracy in laboratory settings. This difficulty has...
12 CitationsSource
#1Xiaoxuan LiuH-Index: 13
#2Livia Faes (Moorfields Eye Hospital)H-Index: 12
Last. Alastair K DennistonH-Index: 31
view all 17 authors...
Summary Background Deep learning offers considerable promise for medical diagnostics. We aimed to evaluate the diagnostic accuracy of deep learning algorithms versus health-care professionals in classifying diseases using medical imaging. Methods In this systematic review and meta-analysis, we searched Ovid-MEDLINE, Embase, Science Citation Index, and Conference Proceedings Citation Index for studies published from Jan 1, 2012, to June 6, 2019. Studies comparing the diagnostic performance of dee...
330 CitationsSource
#1Alvin RajkomarH-Index: 11
#1Alvin RajkomarH-Index: 5
Last. Isaac S. KohaneH-Index: 107
view all 3 authors...
Machine Learning in Medicine In this view of the future of medicine, patient–provider interactions are informed and supported by massive amounts of data from interactions with similar patients. The...
597 CitationsSource
#1Eric J. Topol (Scripps Health)H-Index: 223
The use of artificial intelligence, and the deep-learning subtype in particular, has been enabled by the use of labeled big data, along with markedly enhanced computing power and cloud storage, across all sectors. In medicine, this is beginning to have an impact at three levels: for clinicians, predominantly via rapid, accurate image interpretation; for health systems, by improving workflow and the potential for reducing medical errors; and for patients, by enabling them to process their own dat...
1,100 CitationsSource
#1Nicholas Bien (Stanford University)H-Index: 1
#2Pranav Rajpurkar (Stanford University)H-Index: 19
Last. Matthew P. Lungren (Stanford University)H-Index: 26
view all 23 authors...
Background Magnetic resonance imaging (MRI) of the knee is the preferred method for diagnosing knee injuries. However, interpretation of knee MRI is time-intensive and subject to diagnostic error and variability. An automated system for interpreting knee MRI could prioritize high-risk patients and assist clinicians in making diagnoses. Deep learning methods, in being able to automatically learn layers of features, are well suited for modeling the complex relationships between medical images and ...
168 CitationsSource
It was recently estimated that one billion radiologic examinations are performed worldwide annually, most of which are interpreted by radiologists [1]. Most professional bodies would agree that all imaging procedures should include an expert radiologist’s opinion, given by means of a written report [2]. This activity constitutes much of the daily work of practising radiologists. We don’t always get it right. Although not always appreciated by the public, or indeed by referring doctors, radiologi...
153 CitationsSource
#2Xinshu ZhaoH-Index: 17
2 CitationsSource
#1Carl-Magnus Svensson (Leibniz Association)H-Index: 11
#2Ron Hübler (FSU: University of Jena)H-Index: 2
Last. Marc Thilo Figge (FSU: University of Jena)H-Index: 27
view all 3 authors...
Application of personalized medicine requires integration of different data to determine each patient's unique clinical constitution. The automated analysis of medical data is a growing field where different machine learning techniques are used to minimize the time-consuming task of manual analysis. The evaluation, and often training, of automated classifiers requires manually labelled data as ground truth. In many cases such labelling is not perfect, either because of the data being ambiguous e...
22 CitationsSource
A wide variety of research studies suggest that breakdowns in the diagnostic process result in a staggering toll of harm and patient deaths. These include autopsy studies, case reviews, surveys of patient and physicians, voluntary reporting systems, using standardised patients, second reviews, diagnostic testing audits and closed claims reviews. Although these different approaches provide important information and unique insights regarding diagnostic errors, each has limitations and none is well...
294 CitationsSource
#1Laura Duffy (Glasgow Royal Infirmary)H-Index: 1
#2Shelley GajreeH-Index: 1
Last. Terence J. QuinnH-Index: 37
view all 5 authors...
Background and Purpose—The Barthel Index (BI) is a 10-item measure of activities of daily living which is frequently used in clinical practice and as a trial outcome measure in stroke. We sought to describe the reliability (interobserver variability) of standard BI in stroke cohorts using systematic review and meta-analysis of published studies. Methods—Two assessors independently searched various multidisciplinary electronic databases from inception to April 2012 inclusive. Inclusion criteria c...
142 CitationsSource
Cited By7
#1Nicola K. Dinsdale (University of Oxford)H-Index: 4
#2Emma BluemkeH-Index: 8
view all 6 authors...
The combination of deep learning image analysis methods and large-scale imaging datasets offers many opportunities to imaging neuroscience and epidemiology. However, despite the success of deep learning when applied to many neuroimaging tasks, there remain barriers to the clinical translation of large-scale datasets and processing tools. Here, we explore the main challenges and the approaches that have been explored to overcome them. We focus on issues relating to data availability, interpretabi...
#1Federico Cabitza (University of Milano-Bicocca)H-Index: 22
#2Andrea Campagner (University of Milano-Bicocca)H-Index: 6
Abstract null null This editorial aims to contribute to the current debate about the quality of studies that apply machine learning (ML) methodologies to medical data to extract value from them and provide clinicians with viable and useful tools supporting everyday care practices. We propose a practical checklist to help authors to self assess the quality of their contribution and to help reviewers to recognize and appreciate high-quality medical ML studies by distinguishing them from the mere a...
2 CitationsSource
Last. Dong Yin (Google)H-Index: 14
view all 6 authors...
We propose a simulation framework for generating realistic instance-dependent noisy labels via a pseudo-labeling paradigm. We show that this framework generates synthetic noisy labels that exhibit important characteristics of the label noise in practical settings via comparison with the CIFAR10-H dataset. Equipped with controllable label noise, we study the negative impact of noisy labels across a few realistic settings to understand when label noise is more problematic. We also benchmark severa...
#1Vito Chianca (UniPi: University of Pisa)
#2Domenico Albano (University of Palermo)H-Index: 24
Last. Luca Maria Sconfienza (University of Milan)H-Index: 37
view all 7 authors...
In the last two decades, relevant progress has been made in the diagnosis of musculoskeletal tumors due to the development of new imaging tools, such as diffusion-weighted imaging, diffusion kurtosis imaging, magnetic resonance spectroscopy, and diffusion tensor imaging. Another important role has been played by the development of artificial intelligence software based on complex algorithms, which employ computing power in the detection of specific tumor types. The aim of this article is to repo...
#1Andrea CampagnerH-Index: 6
#2Davide CiucciH-Index: 23
Last. Federico CabitzaH-Index: 22
view all 5 authors...
Abstract In recent years, Machine Learning (ML) has attracted wide interest as aid for decision makers in complex domains, such as medicine. Although domain experts are typically aware of the intrinsic uncertainty around it, the issue of Ground Truth (GT) quality has scarcely been addressed in the ML literature. GT quality is regularly assumed to be adequate, regardless of the number and skills of raters involved in data annotation. These factors can, however, potentially have a severe negative ...
6 CitationsSource
In recent times, with the advancement in technology and revolution in digital information, networks generate massive amounts of data. Due to the massive and rapid transmission of data, keeping up with security requirements is becoming more challenging. Machine learning (ML)-based intrusion detection systems (IDSs) are considered as one of the most suitable solutions for big data security. Despite the progress in ML, unrelated features can drastically influence the performance of an IDS. Feature ...
2 CitationsSource
#1Federico Cabitza (University of Milan)H-Index: 22
#2Andrea CampagnerH-Index: 6
Last. Luca Maria Sconfienza (University of Milan)H-Index: 37
view all 3 authors...
We focus on the importance of interpreting the quality of the labeling used as the input of predictive models to understand the reliability of their output in support of human decision-making, especially in critical domains, such as medicine. Accordingly, we propose a framework distinguishing the reference labeling (or Gold Standard) from the set of annotations from which it is usually derived (the Diamond Standard). We define a set of quality dimensions and related metrics: representativeness (...
6 CitationsSource