How Close are we?: Utilizing Big Data to Improve Clinical Research and Care

The increased digitization and subsequent dissemination of large volumes of electronic medical records, research and development information, billing data, and biometrics have created unprecedented opportunities to improve outcomes, reduce costs and accelerate innovation in health care. Colloquially referred to as “big data,” these enormous datasets are often both structured and unstructured, and difficult to analyze using conventional analytic techniques. Although agreement on a precise definition of big data is elusive, the general concept of what constitutes big data is fairly consistent among those in the research community focused on analyzing such datasets. In particular, the specific challenges inherent to big data, known as the five v’s — volume, variety, velocity, variability and veracity — are perhaps the characteristics that best distinguish big data from traditional datasets. There are already early examples of success in effective analysis of big data. However, there remain significant challenges for the research community that will require forging novel collaborations and deploying additional resources to fully harness the potential of big data into potentially transformative discoveries that will impact clinical care.


Big data approaches have already moved beyond theoretical promise to demonstrate tangible gains in health-care quality.


Big data approaches have already moved beyond theoretical promise to demonstrate tangible gains in health-care quality. A particularly salient example has been efforts by Express Scripts to identify prescription nonadherence by harnessing 10 pedabytes (one million gigabytes) of medical, laboratory and prescription data associated with the filling of 1.4 billion prescriptions per year. Implementing proprietary algorithms derived from this dataset into a tool called ScreenRx, the pharmacy is able to project non-adherence with 94 percent accuracy up to one year in advance using 400 data points derived from the patient, physician, disease and medication. Impressively this accuracy is several-fold higher than traditional approaches that rely on self-reported surveys. Moving forward with this tool, Express Scripts is now able to tailor specific compliance programs to highrisk individuals to reduce waste and improve health-care delivery.

Within gastroenterology, robust efforts are presently underway to leverage big data technologies that have been generated by interoperability among different hospital systems and electronic medical records. This integration has boosted research in pharmacosurveillance for widely prescribed medications such as proton pump inhibitors. In addition, longitudinal medical records integrated with large datasets offer the potential to advance precision medicine through the development of diagnostics, biomarkers and risk prediction tools derived from these high-dimensional data sets. Finally, patient-centered engagement tools in the form of smartphone applications, wearables and social media offer additional data streams for symptom tracking and disease activity monitoring.

Nonetheless the inherent complexity in big data poses new challenges for physicians and researchers eager to embrace these new technologies. Moving beyond standard epidemiological and genetic analytics, successful big data projects will require expertise in complex tools such as machine learning and artificial intelligence. Since such expertise is often internally unavailable at most major academic medical centers, researchers will increasingly depend on interdisciplinary collaborations with engineers, computer scientists and mathematicians. In addition, these projects may require collaborations beyond traditional academic and non-profit spheres. For example, Google (through its subsidiary Verily Life Sciences) has partnered with Stanford and Duke Universities to initiate the Baseline Study, a longitudinal effort to collect and analyze detailed medical records, whole body MRI images, blood, urine, saliva and tears to develop diagnostics and algorithms to predict and prevent cardiovascular disease and cancer. Similarly, Ambry Genetics and Foundation Medicine have deposited large volumes of their data for public research projects. Given the strong interest and expertise of many information technology firms in big data, academic-industry partnerships will likely become a critical model for research training and collaboration. Another barrier to fully realizing the potential of big data within the research community is the lack of adequate resources for such projects. Despite recent high-profile initiatives such as the National Institutes of Health Big Data to Knowledge Program (BD2K), President Obama’s Precision Medicine Initiative and Vice President Biden’s Cancer Moonshot program, governmental funding for interdisciplinary proposals remains at a significant disadvantage compared to those that can be traditionally fit into a single area of expertise. Further disadvantaging big data projects are low funding rates. To address these challenges, some subspecialty societies have already taken bold steps to bridge this funding gap. For example, the American Heart Association has created and endowed the Institute for Precision Cardiovascular Medicine to provide funding to its big data investigators and to serve as an honest broker among academic institutions, industry and the government. In the future, the American Gastroenterological Association, perhaps in partnership with other organizations, may be able to develop a similar vehicle to support big data investigators in digestive disease research.

Finally, irrespective of active participation in big data analytics, all stakeholders in the healthcare system will need to be carefully attuned to discoveries made by these technologies. Similar to our experience with perhaps the first wave of big data in the form of genome-wide association studies, observations from future studies leveraging even more complex datasets will require validation, mechanistic studies and perhaps clinical trials before discoveries are fully integrated into clinical care.

Dr. Gala is the co-founder and has equity in New Amsterdam Genomics, Inc.

Dr. Chan consults for Bayer Healthcare, Pfizer and PLX Pharma. He also serves on the Council on Aspirin in Health Prevention Committee and the American Association for Cancer Research.

Join the discussion

Your email address will not be published. Required fields are marked *