Bioinformatics for Big Data Conference

In the era of precision medicine, enormous amounts of data are being generated from disparate sources including omics, imaging, sensing and beyond. Today, computational scientists need to develop better tools to manage, integrate and share data to make it clinically actionable. The Bioinformatics for Big Data program at the Molecular Medicine Tri-Conference 2019 will showcase how medical centers and pharma industry are developing such tools and software to meet this goal.

Final Agenda

Day 1 | Day 2 | Day 3 | Download Brochure

Arrive Early for:

SUNDAY, MARCH 10, 2:00 - 5:00 PM (AFTERNOON SHORT COURSES)

SC8: Data-Driven Process Development in the Clinical Laboratory - Detailed Agenda

SUNDAY, MARCH 10, 5:30 - 8:30 PM (DINNER SHORT COURSES)

SC12: Clinical Informatics: Returning Results from Big Data - Detailed Agenda

MONDAY, MARCH 11, 8:00 - 11:00 AM (MORNING SHORT COURSES)

SC24: Connected Diagnostics: IoT, Sensors and Wearables Bring Point-of-Care Dx to the Patient

Monday, March 11

10:30 am Conference Program Registration Open (South Lobby)

11:50 Chairperson’s Opening Remarks

Zhongming Zhao, PhD, Professor and Director, Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center at Houston

12:00 pm Identifying Actionable and Druggable Mutations from Cancer Big Data

Zhao_Zhongming Zhongming Zhao, PhD, Professor and Director, Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center at Houston

In this talk, I will first review the computational methods and tools for detecting cancer driver genes and mutations from cancer big data. Then I will present our informatics and biostatistics approaches for identifying cancer mutations and genes from a large amount of somatic mutation data. Finally, I will present an integrative network-based framework for identifying new druggable targets and anticancer indications from existing drugs.

12:30 AI/ML for Pharma R&D: Analytical Challenges and Opportunities

Liu_Ray Ray Liu, PhD, Senior Director, Advanced Analytics and Statistical Consultation, Takeda

Drug development is a lengthy and costly process with a high attrition rate. Recent advancements in AI/ML have provided drug developers with the potential opportunity to generate novel insights from data. But AI/ML is not the panacea. When used blindly, AI/ML can do more harm than good. This presentation will discuss some sweet spots in pharma R&D for AI/ML to succeed.

1:00 Enjoy Lunch on Your Own

2:30 Chairperson’s Remarks

Zhongming Zhao, PhD, Professor and Director, Center for Precision Health, School of Biomedical Informatics, University of Texas Health Science Center at Houston

2:40 Wearables and Wired Health

Snyder_Mike Mike Snyder, PhD, Stanford W. Ascherman Professor and Chair, Department of Genetics; Director, Center for Genomics and Personalized Medicine, Stanford University

Wearable portable biosensors allow frequent measurement of health-related physiology. We have used smart watches and other devices to detect the onset of infectious diseases such as Lyme disease. We have used continuous glucose monitor to detect individuals with glucose dysregulation. Using these devices we can build personalized models for monitoring health status and early onset of disease.

3:10 Methods for Functional Microbiome by Shotgun Metagenomic Sequencing

Li-Hongzhe Hongzhe Li, Professor of Biostatistics and Statistics, Director, Center for Statistics in Big Data, Chair, Biostatistics Graduate Program, Vice Chair for Integrative Research, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania

Shotgun metagenomic sequencing provides a powerful tool for studying functions of microbial communities. Current methods mainly focus on quantifying microbial compositions and gene/pathway compositions. However, such data in combination with metabolomics data provide important information on functional microbiome. I will present methods for quantifying microbiome growth dynamics and for predicting metabolic potential of a given microbial community and show how to use these quantities to study disease and treatment outcome.

3:40 Quantifying Wellness Using Personal, Dense, Dynamic Data Clouds

Earls_John John Earls, PhD, Senior Software Engineer, Institute for Systems Biology

We used personal dense, dynamic data clouds (pD3 clouds), where thousands of multi-modal, longitudinal measurements quantify individual health status, to estimate the biological age of thousands of individuals. I will present our work on integrating measurements of clinical labs, proteomics, metabolomics, and genetics to better understand and quantify wellness through the lens of aging. I will show how the aging process affects these measures and how deviations of biological age from chronological age are manifested in disease. I will also present results demonstrating the effect of lifestyle

4:10 From Data to Insight: Becoming Information-Driven with AI-Powered Search & Analytics

Jozan_Gregorie Gregoire Jozan, Solution Architect, Sinequa

Pharmaceutical organizations are swamped with structured and unstructured data, buried among trade databases, scientific publications, clinical trials, and other sources. Learn how you can extract meaningful insights from the multitude of sources and repositories with the help of AI-powered search.

4:40 Refreshment Break and Transition to Plenary Session

8:00 Plenary Keynote Session (Room Location: 3 & 7)

6:00 Grand Opening Reception in the Exhibit Hall with Poster Viewing

7:30 Close of Day

Day 1 | Day 2 | Day 3 | Download Brochure

Tuesday, March 12

7:30 am Registration Open and Morning Coffee (South Lobby)

8:00 Plenary Keynote Session (Room Location: 3 & 7)

9:15 Refreshment Break in the Exhibit Hall with Poster Viewing

10:15 Chairperson’s Remarks

Hongzhe Li, Professor of Biostatistics and Statistics, Director, Center for Statistics in Big Data; Chair, Biostatistics Graduate Program, Vice Chair for Integrative Research, Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania

10:25 Integrating and Analyzing Heterogeneous Data at Scale to Drive Discovery Biology

Vivek Ramaswamy, Senior Software Engineer, Bioinformatics, Genentech

Our talk will focus on our attempts to integrate data related to genes, variants, cell types, tissues, diseases, animal knock-outs, phenotypes, and pathways, and we will share the challenges and accomplishments for a long-term GERMLINE project.

10:55 Unlocking the Data Trapped within the Electronic Health Record Using EMERSE

Hanauer_David David Hanauer, MD, Program Director for Clinical Informatics, Michigan Institute for Clinical and Health Research, Associate CMIO, Michigan Medicine

The most detailed clinical data are trapped within free text clinical notes, and these data are needed when the structured/coded data are inaccurate or incomplete. For over a decade Michigan Medicine has been developing and using an open source search engine designed for clinical notes, called EMERSE (Electronic Medical Record Search Engine). EMERSE has been used to support a wide range operational, clinical, and research tasks.

11:25 Heterogeneity in “Dirty Data”: Blessings in Disguise for Accelerating Translational Medicine

Khatri_Purvesh Purvesh Khatri, PhD, Associate Professor, Stanford University School of Medicine

This talk will discuss translational bioinformatics approaches to translation medicine in the broad domains of autoimmunity, infection, and inflammation.

11:55 Scientific Information Management (SIM) - Elevating the Health and Science Process to the Next Level

Zeigler_Robert Robert Zeigler, PhD, Director of Customer Solutions, Customer Solutions, L7 Informatics, Inc.
Precision medicine and new classes of treatments, including gene and cell therapies, require a new category of companion informatics platforms that automate and synchronize complex drug discovery and therapeutic processes. This talk will discuss how to enable SIM from bench to bedside in life sciences and healthcare organizations with real-world case studies.

12:10 pm Late Breaking Presentation

12:25 Enjoy Lunch on Your Own

1:35 Refreshment Break in the Exhibit Hall with Poster Viewing

2:05 Chairperson’s Remarks

Olga Sazonova, PhD, Product Scientist II, 23andMe

2:10 Natural Language Processing for Clinical and Translational Research

Xu_Hua Hua Xu, PhD, Professor, Director, Center for Computational Biomedicine, The University of Texas Health Science Center at Houston, School of Biomedical Informatics

Over the past few decades, growing use of Electronic Health Records (EHRs) systems has established large practice-based clinical datasets, which are emerging as valuable resources for clinical and translational research. One of the major challenges of using EHR for clinical research is that much of detailed patient information is embedded in narrative reports. This presentation will describe our recent development of natural language processing (NLP) methods and software for extracting phenotypic information from clinical text in EHR, as well as how such NLP methods and tools can be used to support clinical research, such as drug outcome studies.

2:40 Discover, Predict, Prevent: 23andMe and the Mission of Personalized Healthcare, Part 1

Sazonova_Olga Olga Sazonova, PhD, Product Scientist II, 23andMe

23andMe has built the world’s largest consented, re-contactable database for genetic research, with more than four million consented participants and one billion individual survey responses. 23andMe researchers leverage this unprecedented resource by applying statistical genetics and machine learning to a) uncover novel genetic risk factors for complex disease, b) advance drug discovery, and c) offer personalized predictions of disease risk to all 23andMe customers.

3:10 Discover, Predict, Prevent: 23andMe and the Mission of Personalized Healthcare, Part 2

Laskey_Sarah Sarah Laskey, PhD, Scientist, Health R&D, 23andMe

In addition to characterizing and treating disease, researchers at 23andMe are working toward a future of personalized disease prevention. Researchers are building models to estimate disease risk based on genetics, lifestyle, environment, and behavior, and data collection at 23andMe is expanding its focus to longitudinal surveys and interventional studies, allowing researchers to move from association and correlation to causation — what actions can people take to get results?

3:40 Precision Medicine in the Big Data Era - Key Challenges and Successful Approaches

Jensen_Thomas Thomas Jensen, PhD, CEO, Intomics

The talk will present successful approaches for identifying responding patient subpopulations in both oncology and non-oncology client case studies. A key factor in this is applying Intomics' proprietary Protein-Protein Interaction Network as an important supplement to pathways for data interpretation.

4:10 St. Patrick’s Day Celebration in the Exhibit Hall with Poster Viewing

5:00 Breakout Discussions in the Exhibit Hall (see website for details)

6:00 Close of Day

Day 1 | Day 2 | Day 3 | Download Brochure

Wednesday, March 13

7:30 am Registration Open and Morning Coffee (South Lobby)

8:00 Plenary Keynote Session (Room Location: 3 & 7)

10:00 Refreshment Break and Poster Competition Winner Announced in the Exhibit Hall

Moderator: Matthew Trunnell, Vice President, Chief Data Officer, Fred Hutchinson Cancer Research Center

10:50 Open and Distributed Approaches to Biomedical Research

Kellen_Mike Michael Kellen, PhD, CTO, Sage Bionetworks

Today’s biomedical researchers are increasingly challenged to integrate diverse, complex datasets and analysis methods into their work. Sage Bionetworks develops open tools that support distributed, data-driven science driven, and tests their deployment in a variety of research contexts. These experiences informed development of Synapse, a cloud-native informatics platform that serves as a data repository for dozens of multi-institutional research consortia working with large-scale genomics, bioimaging, clinical, and mobile health datasets.

11:00 The Data Commons/Data STAGE Initiatives

Stanley Ahalt, PhD, Director, Renaissance Computing Institute; Professor, Department of Computer Science, University of North Carolina, Chapel Hill

This talk describes the NIH Data Commons and NHLBI Data STAGE initiatives. The Data Commons aims to establish a shared, universal virtual space where scientists can work with the digital objects of biomedical research, including data and analytical tools. A closely related project, Data STAGE, aims to use the Data Commons to drive discovery using diagnostic tools, therapeutic options, and prevention strategies to treat heart, lung, blood, and sleep disorders.

11:10 Innovation through Collaboration: New Data-Driven Research Paradigms Being Developed by the Pediatric and Rare Disease Communities

Adam C. Resnick, PhD, Director, Center for Data Driven Discovery in Biomedicine (D3b); Director, Neurosurgical Translational Research, Division of Neurosurgery; Director, Scientific Chair, Children’s Brain Tumor Tissue Consortium in Neurosurgery (CBTTC); Scientific Chair, Pediatric Neuro-Oncology Consortium (PNOC); Alexander B. Wheeler Endowed Chair in Neurosurgical Research, The Children’s Hospital of Philadelphia

11:20 Building Trust in Large Biomedical Data Networks

Lucila Ohno-Machado, MD, PhD, Associate Dean, Informatics and Technology, University of California, San Diego Health

11:30 PANEL DISCUSSION: Definitions, Challenges and Innovations of Data Commons

Moderator: Matthew Trunnell, Vice President, Chief Data Officer, Fred Hutchinson Cancer Research Center

Panelists: Stanley Ahalt, PhD, Director, Renaissance Computing Institute; Professor, Department of Computer Science, University of North Carolina, Chapel Hill

Lucila Ohno-Machado, MD, PhD, Associate Dean, Informatics and Technology, University of California, San Diego Health

Michael Kellen, PhD, CTO, Sage Bionetworks

What is a data commons and what are the common challenges in building and maintaining data commons?
Why should you organize your data into a commons?
NIH data commons pilot phase updates and future directions
The role of data commons in promoting open access and open science
Technology innovations

12:30 pm Enjoy Lunch on Your Own

1:10 Refreshment Break in the Exhibit Hall and Last Chance for Poster Viewing

1:50 Chairperson’s Remarks

Matthew Lebo, PhD, FACMG, Director, Bioinformatics, Partners Personalized Medicine; Instructor, Pathology, Brigham and Women’s and Harvard Medical School

2:00 Machine Learning for Data Driven Decision Making of Clinical Trials

Kefeng_Hua Kevin Hua, PhD, Senior Manager, AI Machine Learning Development, Digital Health Intelligence Group, Bayer

Clinical trials are expensive business expenditures. Advances in AI/machine learning and data mining technology and availability of data make data-driven decision making possible in drug development. We would like to present a case study where wearable devices and deep learning models are used to help clinical scientists make faster and more accurate decisions during clinical trials.

2:30 Informatics Approaches to Reducing the Sanger Burden in Clinical NGS Laboratories

Lebo_matt Matthew Lebo, PhD, FACMG, Director, Bioinformatics, Partners Personalized Medicine; Instructor, Pathology, Brigham and Women’s and Harvard Medical School

Recent work has highlighted the accuracy and completeness of NGS such that these additional assays may not be required, especially in the realm of orthogonal confirmation of variants. However, many of these studies have been underpowered to accurately define thresholds for ensuring high confidence in NGS variant calling. In this talk, we’ll discuss algorithmic and machine learning approaches to tackle this problem, demonstrating the ability to dramatically reduce, but crucially not eliminate, the burden of orthogonal confirmation in germline NGS assays.

3:00 From Pixels to Phenotypes: Analysis Of Cellular Images With Multi-Scale Convolutional Neural Networks

godinez_William William J. Godinez, PhD, Research Investigator, Novartis Institutes for BioMedical Research (NIBR)

Large-scale cellular imaging and phenotyping is a widely adopted strategy for understanding biological systems and chemical perturbations. Quantitative analysis of cellular images for identifying phenotypic changes is a key challenge within this strategy, and has recently seen promising progress with approaches based on deep learning. In this talk we describe our approaches based on deep multi-scale convolutional neural networks for phenotyping cellular images. We discuss supervised as well as unsupervised learning strategies, with the latter requiring no phenotypic labels for training. We present an example application based on images of E. Coli bacteria to show how we use machine learning to predict the binding preferences of antibiotics directly from microscopy image data.

3:30 Session Break

3:40 Chairperson’s Remarks

Funda Meric-Bernstam, MD, Chair, Executive, Investigational Cancer Therapeutics, MD Anderson Cancer Center

3:45 Precision Oncology Decision Support

Meric-Bernstam_Funda Funda Meric-Bernstam, MD, Chair, Executive, Investigational Cancer Therapeutics, MD Anderson Cancer Center

Molecular profiling is increasingly utilized in the management of cancer patients. Decision support for precision oncology includes guidance of optimal testing, interpretation of test results including interpretation of functional impact of genomic alterations and therapeutic implications. We will review strategies for decision support and resources for identifying optimal approved or investigational therapies.

4:15 High-Performance Integrated Virtual Environment (HIVE) and BioCompute Objects for Regulatory Sciences

Mazumder_Raja Raja Mazumder, PhD, Associate Professor, Biochemistry and Molecular Medicine Georgetown Washington University

Advances in sequencing technologies combined with extensive systems level -omics analysis have contributed to a wealth of data which requires sophisticated bioinformatic analysis pipelines. Accurate communication describing these pipelines is critical for knowledge and information transfer. In my talk I will provide an overview of how we have been engaging with the scientific community to develop BioCompute specifications to build a framework to standardize bioinformatics computations and analyses communication with US FDA. I will also describe how BioCompute Objects (https://osf.io/h59uh/) can be created using the High-performance Integrated Virtual Environment (HIVE) and other bioinformatics platforms.

4:45 Integrating Genomic and Immunologic Data to Accelerate Translational Discovery at the Parker Institute for Cancer Immunotherapy

Wells_danny Danny Wells, PhD, Scientist, Informatics, Parker Institute for Cancer Immunotherapy

Immunotherapy is rapidly changing how we treat both solid and hematologic malignancies, and combinations of these therapies are quickly becoming the norm. For any given treatment strategy only a subset of patients will respond, and an emerging challenge is how to effectively identify the right treatment strategy for each patient. This challenge is compounded by a concomitant explosion in the amount of data collected from each patient, from high dimensional single cell measurements to whole exome tumor sequencing. In this talk I will discuss translational research at the Parker Institute, and how we are integrating multiple molecular and clinical data types characterize the tumor-immune phenotype of each patient.

5:15 Close of Conference Program

Stay Late for:

MARCH 14-15

S10: Data Science, Precision Medicine and Machine Learning – Detailed Agenda

Day 1 | Day 2 | Day 3 | Download Brochure