CMI-Flu data resource

The CMI-Flu central database

Our goal is to provide simplified access to a wide range of influenza studies, allowing for both the building of large computational models as well as the extraction of specific subsets of data for specific questions. Our pipeline for collecting and integrating data is shown in Figure x and described below.

Sample and Data Collection: Multi-Assay profiling of Influenza responses

Using state-of-the-art assays, we will profile multiple dimensions of the immune response. Innate immunity will be characterized via RNA sequencing of PBMCs, high-dimensional flow cytometry, and multiplex cytokine assays. Adaptive immunity will be profiled by measuring influenza-specific CD4+ T cell responses using activation-induced marker (AIM) assays and TCR sequencing, as well as B cell responses through BCR repertoire analysis, hemagglutination inhibition (HAI) assays, and neutralization tests. Complementary genetic profiling (using SNP arrays and targeted sequencing of HLA, IG, and TR loci) will further enrich our dataset. This data, generated across a spectrum of immune responses, will provide standardised measures over a four-year period, making it ideal to build and evaluate prediction models for influenza.

Our experimental design leverages a robust longitudinal sampling strategy. Participants are recruited from an established annual influenza vaccination cohort with detailed vaccination records and prior exposure history. Blood samples are collected at critical timepoints - from two weeks before vaccination (days -14 and 0) through key post-vaccination time points (days 1, 7, 28, 90 and ~350/365). This schedule is designed to capture the dynamic evolution of the immune response—from early innate activation to long-term antibody durability. Data will be collected over a four-year time period, allowing us to capture repeated exposures in the same participants over multiple flu seasons.

Leveraging existing data

Experimental data generated by the CMI-Flu consortium will be complemented by additional influenza vaccine studies. A wealth of public data already exists in numerous formats across different databases. Regardless of the availability of this data, the ability to leverage multiple datasets to answer a single question is hampered by a few factors, notably an understanding of what data is available, and difficulties associated with standardising and normalising the data across different studies. Our goal is to provide simplified access to a wide range of influenza studies, allowing for both the building of large computational models as well as the extraction of specific subsets of data for specific questions.

Our approach to data standardization

All data will be rigorously curated and standardized in line with HIPC data standards to ensure consistency and facilitate cross-study comparisons. Our integrated central database, hosted on the CMI-X website, will organize information into clear categories: subject demographics, experimental assay results, and metadata/ontologies.

By adhering to FAIR (Findable, Accessible, Interoperable, and Reusable) principles, we aim to create a resource that is not only invaluable for our internal modeling efforts but also broadly accessible to the scientific community.

CMI-Flu central database schema: Structure and implementation

To be completed

AI-ready datasets

To empower advanced computational modeling and ensure that our data can be readily exploited by machine learning and artificial intelligence (AI) approaches, we have transformed our heterogeneous datasets into an AI-ready format. Our integrated datasets are rigorously curated, standardized, and harmonized, ensuring that every entry is consistent, high-quality, and interoperable across multiple studies. By leveraging the robust infrastructure of the ImmuneSpace platform, we have organized data into clearly defined categories - including subject demographics, experimental assay results, and comprehensive metadata - thereby facilitating efficient querying, analysis, and modeling.

Each dataset undergoes a series of quality control and normalization steps to mitigate batch effects and differences in experimental platforms, ensuring that downstream AI models can be trained on data that truly reflects biological variability rather than technical noise. For instance, measurements from serological assays, flow cytometry, and RNA sequencing are processed to yield unified metrics such as normalized HAI titers and standardized gene expression values. This level of rigor not only bolsters the reliability of predictive models but also enhances reproducibility across independent studies and contest cycles.

Moreover, our AI-ready datasets are enriched with multi-dimensional data—from baseline immunological profiles to longitudinal changes in immune responses—which provide a rich substrate for feature extraction and algorithm training. This comprehensive approach enables researchers to explore complex relationships between innate responses, adaptive immunity, and genetic factors, ultimately leading to models that can predict vaccine breadth and durability with greater precision. By making these curated datasets publicly available through our dedicated CMI-X website, we foster an open-science environment where academic and industry researchers alike can leverage state-of-the-art data to drive innovation in influenza systems vaccinology.

Current API endpoints

Subject and cohort level metadata

investigations: https://www.cmi-x.org/cmi-flu/api/v1/investigations
publication: https://www.cmi-x.org/cmi-flu/api/v1/publication
arms: https://www.cmi-x.org/cmi-flu/api/v1/arms
participants: https://www.cmi-x.org/cmi-flu/api/v1/participants
events: https://www.cmi-x.org/cmi-flu/api/v1/events

Assay data:

Assay HAI: https://www.cmi-x.org/cmi-flu/api/v1/assay_hai
Assay Flow: https://www.cmi-x.org/cmi-flu/api/v1/assay_flow
Assay Transcriptomics: https://www.cmi-x.org/cmi-flu/api/v1/assay_transcriptomics?limit=1000

Last updated: Oct. 22, 2025, 3:04 p.m.