The CMI-Flu Project
Goals
Towards a predictive understanding of influenza immunity through experimental data integration, iterative model development, and rigorous assessment of model quality.
The central hypothesis of the CMI-Flu (Computational Models of Immunity - Influenza Vaccination) project is that computational models can accurately predict the breadth and durability of influenza vaccine responses. Vaccine-induced immune memory is conferred by antigen-specific B cells and T cells. Their development is influenced by the immune exposure history of the host, the presence of innate immune responses during priming of the adaptive response, and genetic factors intrinsic to the host. We propose to systematically capture data on these variables, use them to develop and iteratively improve computational models, and rigorously evaluate model predictions in a public contest.
This project targets two central questions
- What determines if influenza vaccination induces broad immune protection in an individual?
- What determines the durability of the vaccine response?
How will we achieve our aims?
1) Capture existing influenza vaccine data and generate new experimental data that connect immunological variables with vaccine breadth and durability
2) Generate computational models predicting the breadth and durability of vaccine responses and the cascade of immune events leading up to it.
3) Rigorously evaluate model prediction performance on unseen data in an open competition.
Multi-Assay Profiling of Influenza Responses
This project captures and generates data on influenza vaccine breadth and durability. We will:
- Capture data from existing studies that profile the breadth and/or durability of influenza vaccine responses and immunological variables in a standardized format.
- Generate new experimental datasets that comprehensively assess immunological variables (Figure 1) along with vaccine breadth (based on HAI titers against a broad panel of influenza strains on day 28 post-vaccination) and durability (based on neutralization of the vaccine strains on day 365). Multi-omic analyses of each individual will characterize their immune state pre- and immediately post-vaccination including the innate response. Targeted sequencing of recombined TCR and BCR repertoires will establish the receptors present in each individual. Single-cell sequencing of antigen-specific T cells and plasmablasts will reveal vaccine-specific receptors. Genetic factors will be assessed through SNP arrays, targeted sequencing of the HLA locus, and the germline TR/IG loci (Figure 2).
Measuring Breadth and Durability
Unlike most studies that measure only peak vaccine responses at one month, we assess immunity at three key timepoints: day 28 (peak response), day 90 (one influenza season; germinal centers still active), and day 365 (long-term memory). Our primary durability readout is neutralizing antibody titers against all four vaccine strains (H1N1, H3N2, B Victoria, B Yamagata) at 12 months. Breadth is assessed via HAI titers against panels of historical H1N1 and H3N2 strains from the past decade.
Readouts Across the Immune Response
We collect data across five domains (Figure 1):
- Exposure history — vaccination records, clinical questionnaires, and age-inferred lifetime strain exposure
- Genetic factors — SNP arrays (~1.8M SNPs), HLA typing, and IG/TR locus genotyping
- Innate immune state — RNA-Seq, flow cytometry, and cytokine profiling pre- and immediately post-vaccination
- CD4+ T cell responses — AIM assays, TCR repertoire sequencing, and single-cell RNA/TCR-Seq of vaccine-specific cells
- Antibody responses — BCR repertoire sequencing, plasmablast single-cell profiling, HAI assays, and neutralization assays
Data collection spans four years, capturing repeated exposures across multiple flu seasons. This deeply characterized cohort will support computational models predicting vaccine breadth and durability.
Figure 1: The immune response to influenza vaccination comprises a cascade of events.
Longitudinal sampling
Our experimental design leverages a robust longitudinal sampling strategy. Participants are recruited from an established annual influenza vaccination cohort with detailed vaccination records and prior exposure history. Blood samples are collected at critical timepoints - from two weeks before vaccination (days -14 and 0) through key post-vaccination time points (days 1, 7, 28, 90 and ~350/365). This schedule is designed to capture the dynamic evolution of the immune response—from early innate activation to long-term antibody durability. Data will be collected over a four-year time period, allowing us to capture repeated exposures in the same participants over multiple flu seasons.
Figure 2: Timepoints for data collection.
Leveraging existing data
Experimental data generated by the CMI-Flu consortium will be complemented by additional influenza vaccine studies. A wealth of public data already exists in numerous formats across different databases. Regardless of the availability of this data, the ability to leverage multiple datasets to answer a single question is hampered by a few factors, notably an understanding of what data is available, and difficulties associated with standardising and normalising the data across different studies. Our goal is to provide simplified access to a wide range of influenza studies, allowing for both the building of large computational models as well as the extraction of specific subsets of data for specific questions.
AI-ready Datasets
Data (Figure 3) will be harmonized and made available through the CMI-Flu repository. This will enable robust, generalizable models of vaccine response. By adhering to FAIR (Findable, Accessible, Interoperable, and Reusable) principles, we aim to create a resource that is not only invaluable for our internal modeling efforts but also broadly accessible to the scientific community.
To empower advanced computational modeling and ensure that our data can be readily exploited by machine learning and artificial intelligence (AI) approaches, we have transformed our heterogeneous datasets into an AI-ready format. Our integrated datasets are rigorously curated, standardized, and harmonized, ensuring that every entry is consistent, high-quality, and interoperable across multiple studies. By leveraging the robust infrastructure of the ImmuneSpace platform, we have organized data into clearly defined categories - including subject demographics, experimental assay results, and comprehensive metadata - thereby facilitating efficient querying, analysis, and modeling.
Each dataset undergoes a series of quality control and normalization steps to mitigate batch effects and differences in experimental platforms, ensuring that downstream AI models can be trained on data that truly reflects biological variability rather than technical noise. For instance, measurements from serological assays, flow cytometry, and RNA sequencing are processed to yield unified metrics such as normalized HAI titers and standardized gene expression values. This level of rigor not only bolsters the reliability of predictive models but also enhances reproducibility across independent studies and contest cycles.
Moreover, our AI-ready datasets are enriched with multi-dimensional data—from baseline immunological profiles to longitudinal changes in immune responses—which provide a rich substrate for feature extraction and algorithm training. This comprehensive approach enables researchers to explore complex relationships between innate responses, adaptive immunity, and genetic factors, ultimately leading to models that can predict vaccine breadth and durability with greater precision. By making these curated datasets publicly available through our dedicated CMI-X website, we foster an open-science environment where academic and industry researchers alike can leverage state-of-the-art data to drive innovation in influenza systems vaccinology.
Generating Models of Immunity
Using our newly-generated experimental data alongside public studies, we will generate computational models predicting the breadth and durability of vaccine responses and the cascade of immune events leading up to it.
We will:
- adapt and refine models developed by our center investigators to answer questions posed in this grant. These include: How does the pre-vaccination immune state drive antibody responses and their durability? What host features dictate the breadth of the antibody response measured by HAI titers? Can vaccine-specific antibody clonotypes be predicted based on the BCR repertoire before vaccination? Can T cell response predictions improve predictions of vaccine-induced antibodies?
- Implement existing published models from outside groups. We will query the literature for publications on computational models of vaccine responses and adapt them to our experimental datasets.
- Combine and expand existing models to develop an integrated understanding of influenza vaccine responses. We will integrate models to consider multiple factors influencing the vaccine response using both ‘black-box’ machine learning models and ‘mechanistic’ models that reflect individual steps in the vaccine response.
Community engagement
Making influenza vaccine data accessible to all
Harmonizing data across studies with different readouts is challenging but unlocks real gains in model prediction power. The CMI-Flu central database brings together our own data alongside other public datasets, standardized to enable cross-study integration and comparison.
A community prediction challenge
Computational models that forecast individual immune responses to the influenza vaccine can reveal mechanistic insights and open the door to personalized vaccination. We invite researchers to submit models to the CMI-Flu Prediction Challenge, where they are evaluated against independent datasets — mirroring the rigorous approach used in areas like protein structure prediction.
This challenge builds on our CMI-PB series (Bordetella pertussis), where the strongest models excelled at handling nonlinearities, reducing large feature sets, and advanced preprocessing — while models repurposed from other vaccine settings fell short. CMI-Flu is part of the broader CMI-X prediction series.
Building a collaborative research community for influenza
The CMI-X series is designed to accelerate progress beyond what any single group could achieve. Participants are invited to discuss influenza immunity, data, and modeling throughout the challenge. All submitted models include reproducible code, and contributors can co-author future CMI manuscripts while retaining the freedom to publish independently.
We are also developing open-access teaching materials — including a lecture on influenza biology and vaccine history, a hands-on data lab, and a multi-week prediction project — for use by the wider educator community.
Impact and Future Directions
Through this open, transparent, and quantitative process of generating and publishing experimental data, coupled with building and evaluating computational models of influenza vaccination-induced immunity, we aim to measure how well computational models predict vaccine breadth and durability. This work not only helps us to understand the human immune response, but also to identify potential vaccine targets and promising candidates, which will further inform vaccine development.
We purposely push to make these models generalizable in the hope that these types of analyses and tools can be extended and applied to other diseases as well, ultimately driving advances that improve public health globally.
Funding
This project is funded through the NIH (U01AI187062).
Last updated: May 19, 2026, 12:38 p.m.