Today, IT professionals are challenged with finding solutions to manage big data being generated at research labs, pharmaceutical companies and medical centers. In order to do this, one must have the compute power, storage solutions, and analytic capability to make the data clinically actionable. Data from disparate sources including omics (genomics, proteomics, metabolomics, etc.), imaging, and sensors must be integrated. Cambridge Healthtech Institute's 2nd Annual Data Management in the Cloud program will bring together key leaders in the fields of cloud architecture and data management to share case studies and to discuss the challenges and solutions they face in their centers. Overall, this event will offer practical solutions for network engineers, data architects, software engineers, etc. to build data ecosystems which enable the goal of personalized medicine.

Final Agenda

Monday, March 11

10:30 am Conference Program Registration Open

CASE STUDIES OF SUCCSESSFUL CLOUD IMPLEMENTATION

11:50 Chairperson’s Opening Remarks

Lisa Dahm, PhD, Director, UC Health Data Warehouse, Center for Data-Driven Insights and Innovation (CDI2), University of California Health, University of California Irvine

12:00 pm Explore the Genomic and Neoepitope Landscape of Pediatric Cancers on the Cloud

Jinghui Zhang, PhD, Chair, Member, Computational Biology, St. Jude Children’s Research Hospital

We will present the driver genes identified from a pan-cancer analysis of 1,699 pediatric cancers and neoepitopes identified from integrative analysis of whole-genome and RNA-seq.

12:30 The GenePattern Notebook Environment for Open Science and Reproducible Bioinformatics Research

Michael Reich, Assistant Director, Bioinformatics, Department of Medicine, University of California San Diego

Interactive analysis notebook environments promise to streamline genomics research through interleaving text, multimedia, and executable code into unified, sharable, reproducible ‘‘research narratives.’’ However, current notebook systems require programming knowledge, limiting their wider adoption by the research community. We have developed the GenePattern Notebook environment (http://www.genepattern-notebook.org), to our knowledge the first system to integrate the dynamic capabilities of notebook systems with an investigator-focused, easy-to-use interface that provides access to hundreds of genomic tools without the need to write code.

1:00 Session Break

1:10 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

2:10 Session Break

CASE STUDIES OF SUCCSESSFUL CLOUD IMPLEMENTATION (CONT.)

2:30 Chairperson’s Remarks

Lisa Dahm, PhD, Director, UC Health Data Warehouse, Center for Data-Driven Insights and Innovation (CDI2), University of California Health, University of California Irvine


2:40 KEYNOTE PRESENTATION: Making STRIDES to Accelerate Discovery: The National Institutes of Health and the Cloud

Andrea NorrisAndrea T. Norris, CIO; Director, NIH Center for Information Technology, Health and Human Services, National Institutes of Health (NIH)

NIH has launched STRIDES, a new initiative to harness the power of commercial cloud computing and provide NIH biomedical researchers access to the most advanced, cost-effective computational infrastructure, tools and services available. The STRIDES Initiative launched with Google Cloud as its first industry partner and aims to reduce economic and technological barriers to accessing and computing on large biomedical data sets to accelerate biomedical advances.

3:10 Scaling On-Premises Containerized Workloads into Public Clouds

Katreena Mullican, Technology Director, HudsonAlpha Institute for Biotechnology

This presentation focuses on the benefits of hybrid cloud architecture and the research that HudsonAlpha IT is exploring to bridge on-premises compute, storage, and network architecture to public cloud. Also included is a demonstration of technology that can be implemented to automate the provisioning of on-premises composable infrastructure as well as public cloud resources, enabling scaling of containerized workloads.

3:40 UC Health Data Warehouse (UCHDW): An Azure Cloud Migration Case Study

Lisa Dahm, PhD, Director, UC Health Data Warehouse, Center for Data-Driven Insights and Innovation (CDI2), University of California Health, University of California Irvine

The University of California Health System has built a secure data warehouse (UCHDW) for operational improvement, promotion of quality patient care, and clinical research. The repository currently holds EHR data on 5 million patients from six UC medical centers, treated by 100,000 clinicians. To support secure, cross-institutional access to this data and analytics platform, a multiphase project is underway to move UCHDW into a HIPAA-compliant Azure cloud.

4:10 Sponsored Presentation (Opportunity Available)

4:40 Refreshment Break and Transition to Plenary Session


8:00 Plenary Keynote Session

6:00 Grand Opening Reception in the Exhibit Hall with Poster Viewing

7:30 Close of Day

Tuesday, March 12

7:30 am Registration Open and Morning Coffee


8:00 Plenary Keynote Session

9:15 Refreshment Break in the Exhibit Hall with Poster Viewing

MANAGING DISPARATE DATA

10:15 Chairperson’s Remarks

Ian Fore, PhD, Senior Biomedical Informatics Program Manager, Center for Biomedical Informatics and Information Technology, National Cancer Institute

10:25 FEATURED PRESENTATION: A Data Commons Framework for Data Management

Robert GrossmanRobert Grossman, PhD, Frederick H. Rawson Professor, Professor of Medicine and Computer Science, Jim and Karen Frank Director, Center for Data Intensive Science (CDIS), Co-Chief, Section of Computational Biomedicine and Biomedical Data Science, Deptartment of Medicine, University of Chicago


10:55 Building an Internet of Genomics

Marc Fiume, PhD, Co-Lead, Discovery Work Stream, Global Alliance for Genomics and Health; Co-Founder, CEO, DNAstack; Co-Founder, Canadian Genomics Cloud

11:25 Storage and Use of dbGap Data in the Cloud

Michael Feolo, Staff Scientist, dbGaP Team Lead, National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH)

The National Center for Biotechnology Information (NCBI) database of Genotypes and Phenotypes (dbGaP) is an NIH-sponsored archive charged to store information produced by genome-scale studies. The next-generation sequence data deposited to dbGaP are processed and distributed by NCBI’s Sequence Read Archive (SRA). This presentation will describe how NCBI and MITRE have implemented access to large genomic datasets provisioned on the cloud, via dbGaP approval, thereby eliminating the need for download.

11:55 Sponsored Presentation (Opportunity Available)

12:25 pm Session Break

12:35 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

1:35 Refreshment Break in the Exhibit Hall with Poster Viewing

HYPE VS REALITY IN CLOUD CAPABILITIES

2:05 Chairperson’s Remarks

Chris Dwan, Senior Technologist and Independent Life Sciences Consultant

2:10 Cloud Transformation 2.0: Embracing the Multi-Cloud Future

Chris Dwan, Senior Technologist and Independent Life Sciences Consultant

Cloud technologies are mature and have achieved broad adoption. While this has brought many benefits, it also means that organizations must deal with legacy and migration challenges around their aging decade-old cloud systems. The diversity of solutions in the marketplace mean that cross-cloud interoperability, data locality, and functional “skew” between clouds can be a significant challenge. This talk will share practical experience and success strategies for managing through this second decade of the cloud.

2:40 Overcoming Internal Hurdles to Cloud Adoption

Tanya Cashorali, CEO, Founder, TCB Analytics

With security, privacy, and performance concerns, many organizations in healthcare and life sciences are hesitant to rollout a cloud-based data and analytics environment. In this session, we’ll review common negative perceptions of the cloud, along with implementation strategies that help mitigate these concerns. We’ll also cover examples of healthcare and pharmaceutical companies that successfully moved to the cloud, and how they navigated pushback from IT and the business.

3:10 Genomics Analysis Powered by the Cloud

Ruchi Munshi, Product Manager, Data Sciences Platform, The Broad Institute

For years, computational biologists have used on-prem infrastructure for all their analytical needs. However, as the amount of genetic data grows, genomics analysis quickly becomes constrained by compute resources available. Today, cloud platforms provide researchers access to so much compute that the next problem is learning how to use those resources effectively. Let’s talk about various tools that leverage cloud resources to power analysis of genetic data.

3:40 Sponsored Presentation (Opportunity Available)

4:10 St. Patrick’s Day Celebration in the Exhibit Hall with Poster Viewing

5:00 Breakout Discussions in the Exhibit Hall

6:00 Close of Day

Wednesday, March 13

7:30 am Registration Open and Morning Coffee


8:00 Plenary Keynote Session

10:00 Refreshment Break and Poster Competition Winner Announced in the Exhibit Hall

DATA COMMONS PANEL SESSION

Moderator: Matthew Trunnell, Vice President, CIO, Fred Hutchinson Cancer Research Center

10:50 Open and Distributed Approaches to Biomedical Research

Michael Kellen, PhD, CTO, Sage Bionetworks

Today’s biomedical researchers are increasingly challenged to integrate diverse, complex datasets and analysis methods into their work. Sage Bionetworks develops open tools that support distributed, data-driven science driven, and tests their deployment in a variety of research contexts. These experiences informed development of Synapse, a cloud-native informatics platform that serves as a data repository for dozens of multi-institutional research consortia working with large-scale genomics, bioimaging, clinical, and mobile health datasets.

11:00 The Data Commons/Data STAGE Initiatives

Stanley Ahalt, PhD, Director, Renaissance Computing Institute; Professor, Department of Computer Science, University of North Carolina, Chapel Hill

This talk describes the NIH Data Commons and NHLBI Data STAGE initiatives. The Data Commons aims to establish a shared, universal virtual space where scientists can work with the digital objects of biomedical research, including data and analytical tools. A closely related project, Data STAGE, aims to use the Data Commons to drive discovery using diagnostic tools, therapeutic options, and prevention strategies to treat heart, lung, blood, and sleep disorders.

11:10 Innovation through Collaboration: New Data-Driven Research Paradigms Being Developed by the Pediatric and Rare Disease Communities

Adam C. Resnick PhD, Director, Center for Data Driven Discovery in Biomedicine (D3b); Director, Neurosurgical Translational Research, Division of Neurosurgery; Director, Scientific Chair, Children’s Brain Tumor Tissue Consortium in Neurosurgery (CBTTC); Scientific Chair, Pediatric Neuro-Oncology Consortium (PNOC); Alexander B. Wheeler Endowed Chair in Neurosurgical Research, The Children’s Hospital of Philadelphia

11:20 Building Trust in Large Biomedical Data Networks

Lucila Ohno-Machado, MD, PhD, Associate Dean, Informatics and Technology, University of California, San Diego Health

11:30 PANEL DISCUSSION: Definitions, Challenges and Innovations of Data Commons

Moderator: Matthew Trunnell, Vice President, CIO, Fred Hutchinson Cancer Research Center

Panelists:

Stanley Ahalt, PhD, Director, Renaissance Computing Institute; Professor, Department of Computer Science, University of North Carolina, Chapel Hill

Adam C. Resnick PhD, Director, Center for Data Driven Discovery in Biomedicine (D3b); Director, Neurosurgical Translational Research, Division of Neurosurgery; Director, Scientific Chair, Children’s Brain Tumor Tissue Consortium in Neurosurgery (CBTTC); Scientific Chair, Pediatric Neuro-Oncology Consortium (PNOC); Alexander B. Wheeler Endowed Chair in Neurosurgical Research, The Children’s Hospital of Philadelphia

Michael Kellen, PhD, CTO, Sage Bionetworks

Lucila Ohno-Machado, MD, PhD, Associate Dean, Informatics and Technology, University of California, San Diego Health

  • What is a data commons and what are the common challenges in building and maintaining data commons?
  • Why should you organize your data into a commons?
  • NIH Data Commons Pilot Phase updates and future directions
  • The role of data commons in promoting open access and open science
  • Technology innovations

12:30 pm Session Break

12:40 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

1:10 Refreshment Break in the Exhibit Hall and Last Chance for Poster Viewing

DATA INTEGRATION, ANALYSIS, AND STORAGE

1:50 Chairperson’s Remarks

Michael Feolo, Staff Scientist, dbGaP Team Lead, National Center for Biotechnology Information (NCBI), National Institutes of Health (NIH)

2:00 FEATURED PRESENTATION: Cloud Infrastructure Enabling Pharma IT Transformation

David SmoleyDavid Smoley, CIO, AstraZeneca


2:30 Things I Didn’t Know I Needed to Know before Attempting to Implement a Cloud-Based Genomics Data Environment

Enoch S. Huang, PhD, Executive Director, Head of Computational Sciences, Pfizer Worldwide Research and Development

This talk will describe the factors behind a major pharma’s effort to move genomics data processing and analysis to a public cloud environment in collaboration with a leading academic institution. I will discuss unanticipated challenges associated with implementation, most of which were not technical or funding-related. Nevertheless, I am optimistic about the future of this platform, and will be sharing the reasons why I believe that this strategy will ultimately produce sustainable solutions for pharma R&D.

3:00 How the pRED Data Commons Facilitates Integration of –omics Data

Jan Kuentzer, Principal Scientist, Data Science, Data Science pRED Informatics, Roche Innovation Center Munich, Roche Diagnostics GmbH

Omics data increasingly influences clinical decision-making. Well-designed and highly integrated informatics platforms become essential for supporting structured data capturing, integration and analytics to enable effective drug development. This talk presents principles and key learnings in designing such a platform, and contrast our current approach to previous approaches in biomedical informatics. Finally, I will provide insights into the implementation of such a platform at Roche.

3:30 Session Break

PERSONALIZED MEDICINE STRATEGIES FOR CLINICALLY ACTIONABLE DATA

3:40 Chairperson’s Remarks

Funda Meric-Bernstam, MD, Chair, Department of Investigational Cancer Therapeutics, MD Anderson Cancer Center

3:45 Precision Oncology Decision Support

Funda Meric-Bernstam, MD, Chair, Department of Investigational Cancer Therapeutics, MD Anderson Cancer Center

Molecular profiling is increasingly utilized in the management of cancer patients. Decision support for precision oncology includes guidance of optimal testing, interpretation of test results including interpretation of functional impact of genomic alterations and therapeutic implications. We will review strategies for decision support and resources for identifying optimal approved or investigational therapies.

4:15 High-Performance Integrated Virtual Environment (HIVE) and BioCompute Objects for Regulatory Sciences

Raja Mazumder, PhD, Associate Professor, Biochemistry and Molecular Medicine Georgetown Washington University

Advances in sequencing technologies combined with extensive systems level -omics analysis have contributed to a wealth of data which requires sophisticated bioinformatic analysis pipelines. Accurate communication describing these pipelines is critical for knowledge and information transfer. In my talk, I will provide an overview of how we have been engaging with the scientific community to develop BioCompute specifications to build a framework to standardize bioinformatics computations and analyses communication with US FDA. I will also describe how BioCompute Objects (https://osf.io/h59uh/) can be created using the High-performance Integrated Virtual Environment (HIVE) and other bioinformatics platforms.

4:45 Integrating Genomic and Immunologic Data to Accelerate Translational Discovery at the Parker Institute for Cancer Immunotherapy

Danny Wells, PhD, Scientist, Informatics, Parker Institute for Cancer Immunotherapy

Immunotherapy is rapidly changing how we treat both solid and hematologic malignancies, and combinations of these therapies are quickly becoming the norm. For any given treatment strategy only a subset of patients will respond, and an emerging challenge is how to effectively identify the right treatment strategy for each patient. This challenge is compounded by a concomitant explosion in the amount of data collected from each patient, from high dimensional single cell measurements to whole exome tumor sequencing. In this talk I will discuss translational research at the Parker Institute, and how we are integrating multiple molecular and clinical data types to characterize the tumor-immune phenotype of each patient.

5:15 Close of Conference Program