Putting the data before the algorithm in big data addressing personalized healthcare.

Published on Aug 19, 2019
· DOI :10.1038/S41746-019-0157-2
Eli M. Cahan3
Estimated H-index: 3
(NYU: New York University),
Tina Hernandez-Boussard57
Estimated H-index: 57
(Stanford University)
+ 1 AuthorsDaniel L. Rubin70
Estimated H-index: 70
(Stanford University)
Technologies leveraging big data, including predictive algorithms and machine learning, are playing an increasingly important role in the delivery of healthcare. However, evidence indicates that such algorithms have the potential to worsen disparities currently intrinsic to the contemporary healthcare system, including racial biases. Blame for these deficiencies has often been placed on the algorithm—but the underlying training data bears greater responsibility for these errors, as biased outputs are inexorably produced by biased inputs. The utility, equity, and generalizability of predictive models depend on population-representative training data with robust feature sets. So while the conventional paradigm of big data is deductive in nature—clinical decision support—a future model harnesses the potential of big data for inductive reasoning. This may be conceptualized as clinical decision questioning, intended to liberate the human predictive process from preconceived lenses in data solicitation and/or interpretation. Efficacy, representativeness and generalizability are all heightened in this schema. Thus, the possible risks of biased big data arising from the inputs themselves must be acknowledged and addressed. Awareness of data deficiencies, structures for data inclusiveness, strategies for data sanitation, and mechanisms for data correction can help realize the potential of big data for a personalized medicine era. Applied deliberately, these considerations could help mitigate risks of perpetuation of health inequity amidst widespread adoption of novel applications of big data.
📖 Papers frequently viewed together
20196.78BMC Medicine
6 Authors (Eric B. Hekler, ..., Ida Sim)
153 Citations
2 Citations
14 Citations
#1Liangyuan Na (MIT: Massachusetts Institute of Technology)H-Index: 1
#2Cong Yang (University of California, Berkeley)H-Index: 1
Last. Anil Aswani (University of California, Berkeley)H-Index: 16
view all 6 authors...
Importance Despite data aggregation and removal of protected health information, there is concern that deidentified physical activity (PA) data collected from wearable devices can be reidentified. Organizations collecting or distributing such data suggest that the aforementioned measures are sufficient to ensure privacy. However, no studies, to our knowledge, have been published that demonstrate the possibility or impossibility of reidentifying such activity data. Objective To evaluate the feasi...
44 CitationsSource
#1Effy Vayena (ETH Zurich)H-Index: 30
#2Alessandro Blasimme (ETH Zurich)H-Index: 19
Last. I. Glenn Cohen (Harvard University)H-Index: 24
view all 3 authors...
Effy Vayena and colleagues argue that machine learning in medicine must offer data protection, algorithmic transparency, and accountability to earn the trust of patients and clinicians.
148 CitationsSource
#1John R. Zech (CPMC: California Pacific Medical Center)H-Index: 6
#2Marcus A. BadgeleyH-Index: 13
Last. Eric K. Oermann (ISMMS: Icahn School of Medicine at Mount Sinai)H-Index: 24
view all 6 authors...
Background There is interest in using convolutional neural networks (CNNs) to analyze medical imaging to provide computer-aided diagnosis (CAD). Recent work has suggested that image classification CNNs may not generalize to new data as well as previously believed. We assessed how well CNNs generalized across three hospital systems for a simulated pneumonia screening task. Methods and findings A cross-sectional design with multiple model training cohorts was used to evaluate model generalizabilit...
425 CitationsSource
#1Alice B. Popejoy (Stanford University)H-Index: 6
#2Deborah I. Ritter (BCM: Baylor College of Medicine)H-Index: 15
Last. Carlos Bustamante (Stanford University)H-Index: 171
view all 16 authors...
: The Clinical Genome Resource (ClinGen) Ancestry and Diversity Working Group highlights the need to develop guidance on race, ethnicity, and ancestry (REA) data collection and use in clinical genomics. We present quantitative and qualitative evidence to characterize: (1) acquisition of REA data via clinical laboratory requisition forms, and (2) information disparity across populations in the Genome Aggregation Database (gnomAD) at clinically relevant sites ascertained from annotations in ClinVa...
43 CitationsSource
#1Milena A. Gianfrancesco (UCSF: University of California, San Francisco)H-Index: 20
#2Suzanne Tamang (Stanford University)H-Index: 9
Last. Gabriela Schmajuk (UCSF: University of California, San Francisco)H-Index: 25
view all 4 authors...
A promise of machine learning in health care is the avoidance of biases in diagnosis and treatment; a computer algorithm could objectively synthesize and interpret the data in the medical record. Integration of machine learning with clinical decision support tools, such as computerized alerts or diagnostic support, may offer physicians and others who provide health care targeted and timely information that can improve clinical decisions. Machine learning algorithms, however, may also be subject ...
252 CitationsSource
#1Mehdi Momen (SBUK: Shahid Bahonar University of Kerman)H-Index: 8
#2Ahmad Ayatollahi Mehrgardi (SBUK: Shahid Bahonar University of Kerman)H-Index: 6
Last. Daniel Gianola (UW: University of Wisconsin-Madison)H-Index: 83
view all 9 authors...
Network based statistical models accounting for putative causal relationships among multiple phenotypes can be used to infer single-nucleotide polymorphism (SNP) effect which transmitting through a given causal path in genome-wide association studies (GWAS). In GWAS with multiple phenotypes, reconstructing underlying causal structures among traits and SNPs using a single statistical framework is essential for understanding the entirety of genotype-phenotype maps. A structural equation model (SEM...
16 CitationsSource
#1Jessica K. Paulus (Tufts University)H-Index: 22
#2Benjamin S. Wessler (Tufts University)H-Index: 11
Last. David M. Kent (Tufts University)H-Index: 61
view all 4 authors...
Racial/ethnic status is frequently a strong predictor of clinical outcomes for an array of conditions, including cardiovascular disease (CVD).1 Several popular clinical prediction models (CPMs) that help guide common medical decisions, such as equations for 10-year atherosclerotic CVD risk, estimated glomerular filtration rate, and pulmonary function, include terms for race. Nevertheless, the use of racial classifications in medicine and biomedical research has been contested based on evidence t...
6 CitationsSource
#1Johnston Sc (University of Texas at Austin)H-Index: 1
#1S. Claiborne Johnston (University of Texas at Austin)H-Index: 104
Artificial intelligence and other forms of information technology are only just beginning to change the practice of medicine. The pace of change is expected to accelerate as tools improve and as demands for analyzing a rapidly growing body of knowledge and array of data increase. The medical student
23 CitationsSource
#1Ian A Scott (UQ: University of Queensland)H-Index: 46
Machine learning, which converts complex data into algorithms, challenges the traditional epidemiologic approach of evidence-based medicine. This commentary discusses intersections between these ap...
11 CitationsSource
Computer scientists must identify sources of bias, de-bias training data and develop artificial-intelligence algorithms that are robust to skews in the data, argue James Zou and Londa Schiebinger. Computer scientists must identify sources of bias, de-bias training data and develop artificial-intelligence algorithms that are robust to skews in the data.
203 CitationsSource
Cited By34
#1Marc G. Chevrette (UW: University of Wisconsin-Madison)H-Index: 14
#2Athina Gavrilidou (University of Tübingen)
Last. Francisco Barona-Gómez (CINVESTAV)H-Index: 21
view all 0 authors...
This review covers literature between 2003–2021The development and application of genome mining tools has given rise to ever-growing genetic and chemical databases and propelled natural products research into the modern age of Big Data. Likewise, an explosion of evolutionary studies has unveiled genetic patterns of natural products biosynthesis and function that support Darwin's theory of natural selection and other theories of adaptation and diversification. In this review, we aim to highlight ...
#1Suraj K Jaladanki (Mount Sinai Hospital)H-Index: 5
#2Akhil Vaid (Mount Sinai Hospital)H-Index: 3
Last. Alexander W. CharneyH-Index: 18
view all 14 authors...
Federated learning is a technique for training predictive models without sharing patient-level data, thus maintaining data security while allowing inter-institutional collaboration. We used federated learning to predict acute kidney injury within three and seven days of admission, using demographics, comorbidities, vital signs, and laboratory values, in 4029 adults hospitalized with COVID-19 at five sociodemographically diverse New York City hospitals, between March-October 2020. Prediction perf...
#1Navchetan Kaur (UCSF: University of California, San Francisco)
#1Navchetan Kaur (UCSF: University of California, San Francisco)H-Index: 5
Last. Atul J. Butte (UCSF: University of California, San Francisco)H-Index: 90
view all 3 authors...
A huge array of data in nephrology is collected through patient registries, large epidemiological studies, electronic health records, administrative claims, clinical trial repositories, mobile health devices and molecular databases. Application of these big data, particularly using machine-learning algorithms, provides a unique opportunity to obtain novel insights into kidney diseases, facilitate personalized medicine and improve patient care. Efforts to make large volumes of data freely accessi...
#1Panagiotis Anagnostou (UTH: University of Thessaly)H-Index: 1
#2Sotiris K. Tasoulis (UTH: University of Thessaly)H-Index: 10
Last. Laia Egea-CortésH-Index: 3
view all 19 authors...
Preventive healthcare is a crucial pillar of health as it contributes to staying healthy and having immediate treatment when needed. Mining knowledge from longitudinal studies has the potential to significantly contribute to the improvement of preventive healthcare. Unfortunately, data originated from such studies are characterized by high complexity, huge volume, and a plethora of missing values. Machine Learning, Data Mining and Data Imputation models are utilized a part of solving these chall...
#1Cecilia S Lee (UW: University of Washington)H-Index: 20
#1Cecilia S. LeeH-Index: 2
view all 3 authors...
#1Jamal Elkhader (Cornell University)H-Index: 2
#2Olivier Elemento (Cornell University)H-Index: 82
In the past few years, Artificial Intelligence (AI) techniques have been applied to almost every facet of oncology, from basic research to drug development and clinical care. In the clinical arena where AI has perhaps received the most attention, AI is showing promise in enhancing and automating image-based diagnostic approaches in fields such as radiology and pathology. Robust AI applications, which retain high performance and reproducibility over multiple datasets, extend from predicting indic...
#1Huong Ly TongH-Index: 7
#2Juan C. QuirozH-Index: 13
Last. Liliana LaranjoH-Index: 12
view all 8 authors...
Given that the one-size-fits-all approach to mobile health interventions have limited effects, a personalized approach might be necessary to promote healthy behaviors and prevent chronic conditions. Our systematic review aims to evaluate the effectiveness of personalized mobile interventions on lifestyle behaviors (i.e., physical activity, diet, smoking and alcohol consumption), and identify the effective key features of such interventions. We included any experimental trials that tested a perso...
1 CitationsSource
#1BM Zeeshan Hameed (Manipal University)H-Index: 4
#2Aiswarya V L S Dhavileswarapu (GITAM: Gandhi Institute of Technology and Management)
Last. Bhaskar K. Somani (Manipal University)H-Index: 37
view all 7 authors...
Artificial intelligence (AI) has a proven record of application in the field of medicine and is used in various urological conditions such as oncology, urolithiasis, paediatric urology, urogynaecol...
#2Mohd JavaidH-Index: 24
Digital imaging and medical reporting have acquired an essential role in healthcare, but the main challenge is the storage of a high volume of patient data. Although newer technologies are already ...
5 CitationsSource