In a recent study published in Science Advances, researchers presented a deep learning framework, termed DeepAIR, to accurately predict adaptive immune receptor (AIR)-antigen binding by integrating both sequence and structure features of AIRs.
One of the most essential processes in adaptive immunity is structural docking between AIRs, such as T cell receptors (TCRs) and B cell receptors (BCRs), and their corresponding antigens. Current approaches for predicting AIR-antigen interaction, however, focus heavily on sequence-derived AIR properties, leaving out structural features that are critical for binding affinity.
About the study
In the present study, researchers presented a deep learning framework, termed DeepAIR, for structure-boosted adaptive immune receptor analysis.
DeepAIR is a deep learning framework that combines three-dimensional structural data for AIR-antigen binding prediction and immune repertoire classification. AlphaFold2-predicted AIR structural data were used, focusing on the complementarity-determining region (CDR3) loop of AIR, V(D)J gene use, structure, and sequence. The stages of data processing were multichannel feature extraction, multimodal feature fusion, and task-specific prediction.
A fusion module with a gating-based mechanism to extract important characteristics from the encoded data was integrated with a tensor fusion mechanism. To objectively characterize the contribution of structure data, two variants of DeepAIR, i.e., DeepAIR-stru and DeepAIR-seq, were created. DeepAIR-stru used only structural data, whereas DeepAIR-seq learned from sequence and V(D)J gene use data. AlphaFold2 was used to predict the unliganded AIR structure and construct the model.
Experimentally validated TCR and BCR (antibody) structures were obtained from the Protein Data Bank (PDB) database. The AIR structure was predicted using the amino acid sequences of full-length beta and heavy chains.
The root mean square deviation (RMSD) values for the predicted and experimentally validated AIR structures were used to determine the prediction accuracy. The team used the counts of unique TCR molecules captured by the peptide-major histocompatibility complex (p-MHC, antigen) as the observed proxy of AIR-antigen binding affinity.
The pMHC-captured single-cell TCR data were obtained from the 10x Genomics website and curated using the Integrative COntext-specific Normalization (ICON) workflow. DeepAIR’s performance was compared with that of DeepAIR-stru, DeepAIR-seq, and DeepTCR, examining whether the predicted binding affinity was accurate enough to determine the specific binding between the TCR and the pMHC.
The researchers also evaluated DeepAIR’s performance in predicting the binding reactivity of TCRs and the BCR binding reactivity to a specific antigen or epitope.
DeepAIR functionality included AIR-antigen binding prediction and immune repertoire classification. For every pMHC, its TCRs in the dataset were split into training (70%), validation (20%), and test (10%) datasets.
A Pearson’s correlation analysis was performed between the pLDDT (predicted Local Distance Difference Test) scores from RMSD and AlphaFold2 values, comparing the predicted TCR CDR3 structures with their real counterparts obtained from the Structural T-cell Receptor Database (STCRDab).
BCRs with experimentally validated antigens were obtained from the Immune Epitope Database (IEDB) and antibodies with experimentally validated binding epitopes from the coronavirus antibody database (CoV-AbDab).
Results
DeepAIR demonstrated exceptional prediction performance in predicting the binding affinity of T cell receptors and the binding reactivity of TCR and BCR. DeepAIR achieved a Pearson’s correlation of 0.8 in predicting the binding affinity of T cell receptors and a median area under the receiver-operating characteristic curve (AUC) of 0.94 and 0.90 in predicting the binding reactivity of BCR and TCR, respectively.
Meanwhile, using the TCR and BCR repertoires, DeepAIR correctly identified every patient with nasopharyngeal carcinoma and inflammatory bowel disease in the test data.
In six datasets containing both TCRs and BCRs (antibodies), DeepAIR achieved superior prediction performance in terms of the AUC curve across all three tasks of AIR-antigen analysis compared to SOTA of immune repertoire classification, including soNNia, DeepTCR, and TCRAI.
The model’s performance was significantly improved by incorporating structural data from the CDR3 region into the DeepAIR model and was also impacted by sequence similarity between the training and test data.
The median prediction accuracy for AIR CDR3 using full sequence was comparable to the median accuracy achieved by AlphaFold2 in the 14th Critical Assessment of Protein Structure Prediction (CASP14).
DeepAIR accurately predicted the TCR structures, revealing that stabilizing the paired α-β structure is crucial for the binding affinity between the TCR and antigen. DeepAIR accurately predicted the AIR-antigen binding affinity and identified important residues that directly contributed to the binding of AIRs to the antigens. The pLDDT reflected the prediction accuracy of the TCR-CDR3 structure.
DeepAIR is an interpretable model that shows important residues in alpha and beta chains that are vital for AIR-antigen binding using attention weights. The model also enabled the examination of AIR-antigen complex stabilization by highlighting structurally and functionally important residues in alpha and beta chains.
The median prediction accuracy for AIR CDR3 regions using full sequence was comparable to that using AlphaFold2 achieved in the 14th Critical Assessment of Protein Structure Prediction (CASP14), indicating that AlphaFold-2-predicted structures were reliable for use. AIR-antigen affinities which were predicted by DeepAIR were the closest to the experimental observations.
Overall, the study findings showed that DeepAIR enhances adaptive immunity prediction by integrating sequence and structural data for AIR-antigen binding analysis. It outperforms SOTA predictors and provides an interpretable model for identifying important residues in alpha and beta chains.
DeepAIR identifies contact residues between beta and antigen, as well as critical residues on the α chain that stabilize the AIR structure. This approach enables a better understanding of AIR-antigen complex stabilization and enhances personalized immunotherapy design.