Student Brown Bag Seminar

A bioinformatics pipeline for interpretable modeling of antigenic change based on persistent homology

When

1 – 2 p.m., Sept. 6, 2023

Where

Abstract:    Immune evasion by seasonal Influenza viruses due to their rapid evolution causes significant morbidity annually.  It is important to know when two strains are antigenically distant, but characterizing these differences in the lab is costly. This talk will discuss motivations, challenges, and ongoing development of the CARROT TOP pipeline to efficiently and interpretably determine antigenic differences between paired H3N2 proteins for a dataset of antigenic tests conducted from 2016 to present. Alpha-fold, FASPR (Fast and Accurate Protein Side Chain Packing), and DLPacker generate 3D protein structures from genetic sequence data. Then the TDA Package in R is used to calculate persistence diagram representations based on the alpha shape filtration of the 3D structure. Pairs of PD’s are turned into a vector via a binned persistent Gaussian kernel. This vector is then used as input for least angle and LASSO regressions upon measures of immune distance taken in the lab. Preliminary results on a small subset of the full dataset will be discussed along with promising future directions of research for ensuring interpretability in the original protein space.