Every facet of healthcare and medicine now generates and has access to enormous amounts of data from across sources, organizations, and the world. The pharmaceutical industry plays a key role in driving informatics for translational research and precision medicine. The 11th Annual Integrated Pharma Informatics program will discuss the challenges related to integrating, analyzing, and interpreting data from clinical trials, sequencing, electronic health records, and wearables. We will discuss informatics strategy for entire organizations from business goals to infrastructure and storage projects. Special attention will be paid to artificial intelligence, machine learning, natural language processing, and how companies are integrating these tools into their informatics infrastructure. We will take a close look at how informatics is driving translational and clinical research projects with a focus on data standardization and integration. Join informatics experts from pharma, biotech, and biomedical research communities to discuss these challenges and real-world examples of informatics projects driving precision medicine.

Final Agenda


Monday, March 11

10:30 am Conference Program Registration Open


11:50 Chairperson’s Opening Remarks

Tom Plasterer, Director, Semantic Technologies, Science & Enabling Units IT, AstraZeneca

12:00 pm Transforming R&D with NLP, Machine Learning, and Advanced Analytics. It’s All About the People!

DeGrave_DanyDany DeGrave, Founder, Unconventional Innovation

A view from the trenches. Having implemented several AI projects using NLP, machine learning, and/or advanced analytics, it became clear that the success of these projects heavily depends on the people involved in the project, and for more than one reason. Through a use case, I’ll explain why and what to watch out for in your upcoming AI project.

12:30 FAIR Data Knowledge Graphs

Plasterer_TomTom Plasterer, Director, Semantic Technologies, Science & Enabling Units IT, AstraZeneca

FAIR data has flown up the hype curve without a clear sense of return from the required data stewardship investment. The killer use case for FAIR data is a science knowledge graph. It enables you to richly address novel questions of your and the world’s data. We started with data catalogues (findability) which exploited linked/referenced data using a few focused vocabularies (interoperability), for credentialed users (accessibility), with provenance and attribution (reusability) to make this happen.

1:00 Session Break

1:10 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

2:10 Session Break


2:30 Chairperson’s Remarks

Albert Wang, Head, IT Business Partner, Translational Research & Technologies, R&D IT, Bristol-Myers Squibb

2:40 Building a Digital Health Data Infrastructure

Wang_AlbertAlbert Wang, Head, IT Business Partner, Translational Research & Technologies, R&D IT, Bristol-Myers Squibb

BMS is currently executing a broad Digital Health strategy, supporting our Research, Development, and Commercial functions. This talk will describe a vision for a unified patient-centric data platform, including clinical trial data, translational research data, and real-world evidence.

3:10 Data Democratization: Developing Self-Service Data Exploration and Analysis Platforms to Enable Faster, High-Quality Scientific Discovery

Caulder_DanaDana Caulder, Director, Software Engineering, Bioinformatics and Computational Biology, Genentech

It is essential that bench scientists have direct access to data and sophisticated, relevant analysis methods so that they can self-explore and generate scientific hypotheses. Computational scientists should not be a bottleneck in this process. I will present our approach and progress in developing a variety of self-service tools, ranging from in vivo data analysis to genomics and genetics data exploration and analysis.

3:40 Therapeutic Molecule Foresight Tool Saving Lives and Saving Billions of Dollars

Hamed_Ahmed AbdeenAhmed Abdeen Hamed, PhD, Applied Computer Scientist, Quantum Computing, Merck & Co.

This talk will present a novel network algorithm called SpecRank that is specialized in searching and ranking molecules using a biomedical literature. Starting with a disease-related set of publications of interest, a feature extraction step is performed to identify the biological features associated with the drugs of study. The SpecRank is a network centrality algorithm that is essential in deriving a rank when specificity is in question. Results highlight the promise that SpecRank offers in two folds: (a) the theoretical contributions of a novel specificity-based centrality measure, which may be viewed as a summary of common centrality measures, and (b) the algorithm is an interesting search-and-rank tool that we have demonstrated its importance in designing and discovering drugs.

4:10 New Solutions for Research Informatics

Tim Parrott, Application Scientist, ChemAxon

Building on 20+ years of providing software for chemistry informatics, ChemAxon is creating new solutions for capturing, searching, analyzing and sharing scientific data. Recent developments will be highlighted in this overview of our portfolio.

Astrix 4:25 Addressing Data Management Challenges in Genomics Research

Dale Curtis, President, Astrix Technology Group

Biotechnology firms engaged in genomics research are experiencing significant challenges in managing increasing data volume, variety and complexity. In this talk, we will discuss best practices in building the laboratory informatics infrastructure to effectively address these challenges. 


4:40 Refreshment Break and Transition to Plenary Session

8:00 Plenary Keynote Session

6:00 Grand Opening Reception in the Exhibit Hall with Poster Viewing

7:30 Close of Day


Tuesday, March 12

7:30 am Registration Open and Morning Coffee

8:00 Plenary Keynote Session

9:15 Refreshment Break in the Exhibit Hall with Poster Viewing


10:15 Chairperson’s Remarks

Ryan Copping, Global Head of Analytics, Personalized Healthcare Data Science, Genentech

10:25 Linking Complex Data Sources in RWE and Personalized Medicine

Ryan Copping, Global Head of Analytics, Personalized Healthcare Data Science, Genentech

10:55 Initiatives for Using RWD to Improve Drug Discovery and Clinical Trials

Kruger_MarkMark Kruger, Head of Data Management & Analytics (Real World Evidence & Clinical Outcomes), Sanofi

This talk will discuss initiatives to use real-world data to improve drug discovery. We will also discuss how this data can be applied in clinical trials.

11:25 Smarter Medical Research & Learning Health Systems through Oncology Real World Data

Becnel_LaurenLauren Becnel, PhD, Head, Global Real World Evidence - Oncology, Global Real World Evidence, Pfizer

New strategies are needed to reduce the time it takes to translate new medical research discoveries to clinic, while ensuring that they are safe and measurably improve care. To have a true learning health systems, we must better integrate clinical research and healthcare. This talk will provide examples this translation is beginning to occur for oncology to ultimately lead to delivering the right medicines at the right time to the right patients.

11:55 Presentation to be Announced

12:10 pm Solving Challenges in Biologics Drug Discovery with Integrated Informatics

Andrew LeBeau, PhD, Senior Manager, Biologics Marketing, Dotmatics, Inc.

Biologics drug discovery requires informatics systems that support both the diverse range of biologics therapeutic types and traditional competencies such as screening, inventory management, etc. Well-integrated systems facilitate faster and fully informed decision-making, leading to better outcomes. This talk will highlight key requirements of an integrated system for biologics discovery.

12:25 Session Break

12:35 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

1:35 Refreshment Break in the Exhibit Hall with Poster Viewing


2:05 Chairperson’s Remarks

Lauren Becnel, Senior Director, Real World Data, Analytics & Data Strategy, Pfizer

2:10 Utilizing Your Virtual Twin to Improve Health Outcomes

Morton_StuStuart Morton, PhD, Senior Research Scientist, LRL Informatics Capabilities, Eli Lilly

Previous uses of electronic healthcare records (EHRs) were limited to billing and administrative purposes. However, as EHRs continue to integrate health data from multiple sources such as hospitals and clinics/labs, the ability to mine that data for more effective patient outcomes is becoming a reality. The Medicinal MatchMaker is a treatment assessment tool for physicians to improve outcomes for patients by using data collected from health outcomes of patients that most resemble their patient.

2:40 Developing Digital Endpoints: Case Studies on Data Collection, Value Extraction and Validation

Clay_IeuanIeuan Clay, PhD, Group Lead, Digital Endpoints, Translational Medicine, Novartis Institutes for Biomedical Research

The digital endpoint field is rapidly evolving and maturing, with evermore examples emerging of how technology and analytics are helping us understand patient lives. We present several case studies and lessons learned on how we are approaching data ingestion, algorithm development and validation in order to capture novel, QoL relevant insights on a population and patient level.

DNAnexus3:10 Illuminating the Path Towards Precision Medicine with Apollo and UK Biobank

Eric Talevich, PhD, Senior Scientist, DNAnexus

DNAnexus Apollo enables interactive and visual analysis of heterogeneous, population-scale datasets within a secure network. We present applications using the UK Biobank to understand contributions of genetic predisposition to the development of disease, and generalize to clinical and observational studies.


3:40 PANEL DISCUSSION: Hurdles in Real-World Data: Where Do We Go from Here?

Session Speakers

This panel will discuss the current and future challenges in real-world data, from EHRs to digital biomarkers. How is data curated, integrated, and analyzed? What barriers do we face in standardization? What’s next for using this data in R&D?

4:10 St. Patrick’s Day Celebration in the Exhibit Hall with Poster Viewing

5:00 Breakout Discussions in the Exhibit Hall

6:00 Close of Day


Wednesday, March 13

7:30 am Registration Open and Morning Coffee

8:00 Plenary Keynote Session

10:00 Refreshment Break and Poster Competition Winner Announced in the Exhibit Hall


10:50 Chairperson’s Remarks

Pankaj AgarwalPankaj Agarwal, PhD, FRSB, Senior Fellow & Senior Director, Computational Biology, RD Target Sciences, GSK


11:00 Machine Learning and Statistical Approaches for Reverse Translation

Sandor SzalmaSandor Szalma, PhD, Global Head, Computational Biology, Takeda Pharmaceuticals


We have recently implemented a global computational biology team and have been expanding the capabilities to enable reverse translation across the diseases of our interest. In this presentation, I will discuss our computational infrastructural approach and a couple of initial computational experiments to explore real-world data and machine learning methodologies to better understand patient journeys in support of the research and development organization.

11:30 Diagnosing Rare Disease Patients: Progress in Fully Automated Diagnosis

Tom DefayTom Defay, Senior Director, R&D Strategy and Alliances, SPMD, Strategy, Program Management and Data Sciences, Alexion


Diagnosing patients with rare disease is challenging. Whole exome and whole genome sequencing have improved our diagnostic abilities, but in many cases many potentially disease-causing genes have potentially pathogenic mutations associated with them. By combining phenotypic information automatically extracted from the patient’s EMR with a patient’s genome sequence, we have developed a system for proposing possible diagnoses. The effectiveness and potential utility of this approach will be discussed.

12:00 pm Projects Colossus and Enigma:  The Use of AI Methods in Early Drug Discovery and Late Drug Development

Kim Branson, PhD, Head of A.I (ECDi), Genentech 

The historical repository of clinical trial data represents some of the most costly and important data in pharmaceutical development.   Clinical trial data is often analyzed by function, bio marker, safety, pharmacokinetic etc.  We present the development of the Colossus project at Genentech; an extensible framework integrating clinical, pathology, imaging, rna and genomic data.  Using this data we are able to build models to predict responders and non responders,  clinical trajectory and adverse event likelihood.  Using this we were able to identify an new responder subpopulation, which replicated across trials and indications.   The enigma project uses deep learning methods in the discovery and formulation of novel small molecules.  Deep learning and generative methods have produced models with greater predictive power and robust characteristics than previous methods.   

12:30 Session Break

12:40 Luncheon Presentation (Sponsorship Opportunity Available) or Enjoy Lunch on Your Own

1:10 Refreshment Break in the Exhibit Hall and Last Chance for Poster Viewing


1:50 Chairperson’s Remarks

Michael Feolo, Staff Scientist, dbGaP Team Lead, National Center for Biotechnology Information, NIH

2:00 FEATURED PRESENTATION: Cloud Infrastructure Enabling Pharma IT Transformation

David SmoleyDavid Smoley, CIO, AstraZeneca

2:30 Talk Title to be Announced

Irene Pak, Lead R&D Data Architect, Bristol-Myers Squibb

3:00 How the pRED Data Commons Facilitates Integration of –omics Data

Kuentzer_JanJan Kuentzer, Principal Scientist, Data Science, Data Science pRED Informatics, Roche Innovation Center Munich, Roche Diagnostics GmbH

Omics data increasingly influences clinical decision-making. Well-designed and highly integrated informatics platforms become essential for supporting structured data capturing, integration and analytics to enable effective drug development. This talk presents principles and key learnings in designing such a platform and contrasts our current approach to previous approaches in biomedical informatics. Finally, I will provide insights into the implementation of such a platform at Roche.

3:30 Session Break


3:40 Chairperson’s Remarks

Funda Meric-Bernstam, MD, Chair, Executive, Investigational Cancer Therapeutics, MD Anderson Cancer Center

3:45 Precision Oncology Decision Support

Meric-Bernstam_FundaFunda Meric-Bernstam, MD, Chair, Executive, Investigational Cancer Therapeutics, MD Anderson Cancer Center

Molecular profiling is increasingly utilized in the management of cancer patients. Decision support for precision oncology includes guidance of optimal testing, interpretation of test results including interpretation of functional impact of genomic alterations and therapeutic implications. We will review strategies for decision support and resources for identifying optimal approved or investigational therapies.

4:15 High-Performance Integrated Virtual Environment (HIVE) and BioCompute Objects for Regulatory Sciences

Mazumder_RajaRaja Mazumder, PhD, Associate Professor, Biochemistry and Molecular Medicine Georgetown Washington University

Advances in sequencing technologies combined with extensive systems level -omics analysis have contributed to a wealth of data which requires sophisticated bioinformatic analysis pipelines. Accurate communication describing these pipelines is critical for knowledge and information transfer. In my talk, I will provide an overview of how we have been engaging with the scientific community to develop BioCompute specifications to build a framework to standardize bioinformatics computations and analyses communication with US FDA. I will also describe how BioCompute Objects (https://osf.io/h59uh/) can be created using the High-performance Integrated Virtual Environment (HIVE) and other bioinformatics platforms.

4:45 Integrating Genomic and Immunologic Data to Accelerate Translational Discovery at the Parker Institute for Cancer Immunotherapy

Wells_DannyDanny Wells, PhD, Scientist, Informatics, Parker Institute for Cancer Immunotherapy

Immunotherapy is rapidly changing how we treat both solid and hematologic malignancies, and combinations of these therapies are quickly becoming the norm. For any given treatment strategy, only a subset of patients will respond, and an emerging challenge is how to effectively identify the right treatment strategy for each patient. This challenge is compounded by a concomitant explosion in the amount of data collected from each patient, from high dimensional single cell measurements to whole exome tumor sequencing. In this talk, I will discuss translational research at the Parker Institute, and how we are integrating multiple molecular and clinical data types to characterize the tumor-immune phenotype of each patient.

5:15 Close of Conference Program