The Value of R&D Data Accessibility for Exploratory Analysis
Life sciences organizations – from international pharmaceutical behemoths to bench-top bio-techs – all generate mounds of non-clinical and clinical data that can and should be leveraged to empower better patient outcomes, further exploratory research, and more agile science. Below, we explore the value of R&D data accessibility for exploratory analysis and considerations for companies looking to get started with an effective R&D data strategy.
Data Challenges in Life Sciences
Historically, R&D data was collected and analyzed to answer predetermined questions to address specific program and therapeutic needs. The use of the data thereafter was often limited to individual program silos and storage across platforms and functional groups. Data challenges in life sciences organizations are exacerbated by the growing number of tools and technologies used – EDC, CTMS, eTMF, Box folders, Teams channels, SharePoint folders, email communications, and even paper forms, among others – combined with the natural functional silos that use these data sets. Years of work and scientific discoveries once destined to become fragmented, uncatalogued data files can be strategically stored, governed, and accessible for accelerated analysis and discovery across disease groups.
Getting Started
To begin, R&D organizations must start at the end and work backward. Where is the organization strategically looking to make a difference in the way it uses its R&D data? Considering the end state will allow R&D leaders alongside data engineers and architects to work backward to make the data pipelines necessary to enable the desired data access and analysis. The concept of a “Data Lake” is surging, and everyone is curious about how they can get their data into one, searchable location. This is possible, but the key to achieving this vision is considering how to govern the data that will go into the Data Warehouse (Snowflake, Data Bricks, etc.). Necessary questions about the ownership, maintenance, data models, use cases, and security of the data, particularly when handling sensitive PII/PHI information, need to be answered.
Creating a Data Catalog
With these questions answered, the data sources can be cataloged, and metadata tags can be applied. Furthermore, the creation of a data catalog engages the exercise of developing a data glossary with common language between sources and unveils the breadth of assets available to be leveraged for exploratory science. A data catalog can provide a single source of truth and help organizations make better decisions by providing a clear and complete view of the data while helping to avoid inconsistent versions of the same reference data across systems.
Incorporating Change Management
Next, with a secure data catalog created by data professionals in collaboration with expert scientists, it is key to assess the change management aspect of the data transformation. Scientists inherently delve into data, make connections, and discover idiosyncrasies. Because of this passion, it may be counterintuitive to put structure around data that is supposed to be exploratory. However, it’s important to emphasize and socialize that structure empowers flexibility. The more clearly data is organized, named, and tagged, the easier it will be to leverage across datasets. The “Walk to Run” approach is employed where initial categorization and cataloging of the data may take additional time and effort, but the result is the opportunity to analyze data more fluidly. Scientists will be empowered to discover nuances and the details behind surface-level correlation.
Selecting the Right Platform
With a myriad of tools already at play at an organization, there’s a decision to be made for which platforms should stay versus which should be retired. The most common way organizations limit their growth is by using niche systems that solve challenges for individual functional groups. Grounding the data approach in enterprise tool selections and governance is what will make change sustainable. Enterprise technology and practices come with the notion that senior leadership values consistency and is invested in the process to make increase R&D data accessibility and make it easier to utilize. Key platforms and connections to consider are a data governance tool, data storage medium, ETL mechanisms, data visualization tool, and the process and practices to maintain the system’s health and data journey.
Unlocking the Full Potential of Clinical Data
These decision-making conversations are where Clarkston data strategy experts partner with leading biotech and life sciences organizations to paint the full picture, develop a plan, and then execute on that vision. Clarkston’s data engineers paired with clinical industry thought leaders keep the flexibility and focus of science top of mind while solving the complex data challenges that inhibit pharmaceutical and biotechnology organizations from unlocking the full potential of their clinical data.