an illustrated heart with the word "data" on it

eTech Insight – Data Aggregation Solutions Drive Big Data Accessibility

Creating a Big Data Environment for Improved Analytics and AI

Healthcare providers are keenly aware of the need to allow their data to be used by cohort organizations and technology companies to establish big data repositories that provide more accurate analyses and more sophisticated AI algorithms. Organizations such as Mayo Clinic sell their data to technology companies to facilitate advancements in AI[1]. Patient advocates are concerned that the data may not be deidentified to the extent necessary to protect patient identities, that patients may have not opted in for their data to be used for these projects, or that patients can’t share in the monetary rewards that are being received by the providers for the use of their data.

The COVID-19 pandemic has revived the challenges of creating big data repositories that can be used to help identify new treatments, populations at risk, and potential medications that help with the recovery or prevention of the coronavirus. The need for sharing large population data sets for research purposes will likely drive new government regulations that will support the sharing of this data while also providing the necessary patient protections.

While several government databases are provided to support data analytics in provider organizations, they are not robust enough or extensive enough to support the more sophisticated machine learning or deep learning processes[2]. The ability to aggregate data sets from cohort organizations is a requirement for resolving this issue from a technology standpoint.

Emerging Data Acquisition Solutions Will Enable Big Data Analytics

Solutions are emerging for a market segment that I will call “Data Acquisition Solutions.” These solutions provide access to several different types of databases ranging from medical, pharmacy, EHR, lab, imaging, device consumer, SDoH, genomic, disease registry, claims, wearable devices, and cohort organizations who are willing to share their data.

These solutions provide unique data-matching capabilities and data-deidentification processes that protect patient confidentiality. These functions will enable these solutions to be successfully implemented for creating highly accurate analyses of chronic disease and general patient populations. The expected outcomes will be improved AI-based clinical decision support and evidence-based medicine protocols. As the breadth and depth of data from these solutions continues to mature, providers will be able to apply continuous quality improvement methods to further optimize AI and evidence-based medicine protocols.

The data acquisition solutions will likely be a catalyst for aggregating genomics data for analysis with medication, diagnostic, and outcomes data that will support precision medicine for improving the treatments of chronic diseases and cancer.

The data acquisition solutions can be integrated with provider enterprise data warehouse solutions and data visualization tools to extend the value of those investments. This results in an immensely powerful big data environment that enhances descriptive, predictive, and prospective analytic capabilities. Organizations that possess these data analytics and AI capabilities will be better positioned to survive the post-COVID-19 world of healthcare delivery.

Big Data Environments Optimize Healthcare Delivery Services

COVID-19 highlighted the significant gaps provider organizations have for extending and delivering effective and efficient healthcare services. The ability to close those gaps will be driven by analyzing patient data across all modalities of care. The ability to include data from provider cohorts and other trusted data sources (e.g., Google[3] and Epic Health[4]) will further enhance a provider’s ability to create and drive improvements to healthcare delivery services that are high in quality outcomes and patient satisfaction. Many of the healthcare delivery service improvements will be support by AI solutions that require big data environments to create effective algorithms. Providers who implement data acquisition solutions to support AI and improved protocols will likely have first-mover advantages in their markets.

Big Tech and Emerging Innovators

The data acquisition solution market includes Google’s Project Nightingale as a big technology company that is driving to acquire patient data to create big data supporting analytics and the development of AI across healthcare. Representative emerging vendors in this market are:

Mayo Clinic, Stanford, and Intermountain Healthcare are examples of providers with data sharing services.

Success Factors

  1. Ensure that any liability from data use provided by data acquisition solution vendors is borne solely by the vendor. Providers who decide to share their data with cohorts or other medical companies need to establish strict data use and management agreements as defined by their legal counsel.
  2. Get buy-in from the patient advocate executive for the creation of big data environments relative to deidentification and patient confidentiality compliance.
  3. Validate the ability to integrate data acquisition solution data sets with the current enterprise data warehouse solution.


The ability to acquire data from other trusted data sources to create a big data environment will be crucial for providers to optimize their data analytics to improve operations, standard medical protocols, and AI algorithms for clinical, financial, and supply chain decision support. COVID-19 has exposed critical gaps in care delivery and supply chain management that will drive the transformation of many provider modalities of care and workflow processes.

As the healthcare industry advances with the use of data for managing and optimizing services and operations, new data sources will be required to remain competitive. Acquiring a data acquisition solution will likely position the provider organization as a top competitor in their market and may provide a new revenue source if they share their data. This may be a new revenue source for the provider as healthcare reimbursement models change.

Mollifying patient advocacy groups will be a short-term challenge, but one that is likely to be resolved due to the tremendous potential of exposing large data sets for medical research[5]. To quote Martin Uzochukwu Ugwu, “Tenacity
     Photo Credit: Adobe Stock, wladimir1804