Summary

Computational Prediction of Amino Acid Preferences of Potentially Multispecific Peptide-Binding Domains Involved in Protein-Protein Interactions

Published: January 26, 2024
doi:

Summary

We describe a methodology based on sequence diversification to estimate the amino acid preferences of multispecific binding sites in protein-protein interactions (PPIs). In this strategy, thousands of potential peptide ligands are generated and screened in silico, thus overcoming some limitations of available experimental methods.

Abstract

Many protein-protein interactions involve the binding of short protein segments to peptide-binding domains. Usually, such interactions require the recognition of linear motifs with variable conservation. The combination of highly conserved and more variable regions in the same ligands often contributes to the multispecificity of binding, a common property of enzymes and cell signaling proteins. Characterization of amino acid preferences of peptide-binding domains is important for the design of mediators of protein-protein interactions (PPIs). Computational methods are an efficient alternative to the often costly and cumbersome experimental techniques, enabling the design of potential mediators that can be later validated in downstream experiments. Here, we described a methodology using the Pepspec application of the Rosetta molecular modeling package to predict the amino acid preferences of peptide-binding domains. This methodology is useful when the structure of the receptor protein and the nature of the peptide ligand are both known or can be inferred. The methodology starts with a well-characterized anchor from the ligand, which is extended by randomly adding amino acid residues. The binding affinity of peptides generated this way is then evaluated by flexible-backbone peptide docking in order to select the peptides with the best predicted binding scores. These peptides are then used to calculate amino acid preferences and to optionally compute a position-weight matrix (PWM) that can be used in further studies. To illustrate the application of this methodology, we used the interaction between subunits of human interferon regulatory factor 5 (IRF5), previously known to be multispecific but globally guided by a short conserved motif called pLxIS. The estimated amino acid preferences were consistent with previous knowledge about the IRF5 binding surface. Positions occupied by phosphorylatable serine residues exhibited a high frequency of aspartate and glutamate, likely because their negatively charged side chains are similar to phosphoserine.

Introduction

Interaction between two proteins often involves the binding of short segments of amino acids to peptide-binding domains, resembling protein-peptide interfaces. Receptor proteins involved in such protein-protein interactions (PPI) often have the ability to recognize a certain set of overlapping but divergent ligand sequences, a property known as multispecificity1,2. Multispecific recognition is a feature of many cellular proteins, but it is particularly remarkable in enzymes and cell signaling proteins3. Proteins interacting with multispecific binding sites often have a combination of more and less conserved regions in their sequence4,5,6. In this scenario, the more conserved sequence motifs are involved in stringent molecular interactions. Conversely, the more variable sequences interact with somehow permissive surfaces in the receptor binding site. Usually, these less conserved but still functionally relevant segments are loops lacking defined secondary structure patterns or have even more dynamic conformations, such as those typical of intrinsically disordered proteins7.

Identification of potential peptide ligands of binding sites is usually the first step in the design of mediators able to interfere with the corresponding PPIs8. However, it is often unlikely to find a single most frequent amino acid residue at most sequence positions in ligands of multispecific binding sites. Instead, these sites may have particular preferences for a specific class of amino acids according to their chemical properties, e.g., acidic and negatively charged amino acids such as aspartate or glutamate, bulky aromatic amino acids such as phenylalanine or more hydrophobic residues such as aliphatic amino acids alanine, valine, leucine or isoleucine3. Several experimental methods can provide insights about amino acid preferences of protein binding sites, including directed evolution9, multi-codon scanning mutagenesis10, and deep mutational scanning11. All of these methods follow the approach of sequence diversification, which is based on introducing mutations to original ligands and further analyzing their effect on the function of the receptor protein (see Bratulic and Badran12 for a comprehensive review). However, these methods often require the survey of large sequence libraries, which makes them more cumbersome, costly, and time-consuming.

Computational methods to infer the amino acid preferences of multispecific binding sites have the potential to circumvent the limitations of wet lab methods. Among these, the in silico sequence diversification approach evaluates the energetic impact of a wide range of amino acid replacements in the ligand sequence as a way to characterize the structural plasticity of the PPI13. This method begins with the structure or model of the peptide ligand bound to the receptor binding site and subsequently introduces mutations to the ligand sequence. Statistical and energy-scoring functions are then used to evaluate the impact of these mutations on stability and binding affinity. The set of best-scoring ligand sequences resulting from the evaluation phase can then be used to compute the amino acid preferences. This strategy has the potential to process a very high number of ligand sequences in an efficient manner. Therefore, it can provide a more complete and consistent inference of amino acid preferences compared to those computed from the more limited number of sequences that can usually be processed in wet lab approaches.

The Pepspec application of the Rosetta molecular modeling suite14 is a tool that performs sequence diversification as a key step of its peptide design mode. This application requires a structure or model of the receptor protein with a bound peptide down to a single amino acid residue in length, which is used as an anchor for the next steps. The sequence of the bound peptide is then extended (if necessary) and diversified to generate a large number of putative peptide ligands. The binding affinity of these peptides is then evaluated by flexible-backbone peptide docking in order to select those with the best predicted binding scores. Although the main output of this application is the best peptide candidates selected at the end of the design phase, the much larger set of peptides accepted during this phase can also be used to compute the amino acid preferences of the target binding site. Amino acid preferences are computed as the frequency of each amino acid residue per position of the ligand sequence represented either as a position weight matrix (PWM) or as a more visual sequence logo.

In this article, we describe a protocol to estimate the amino acid preferences of the binding surface of a receptor protein involved in a PPI. The protocol is focused on PPIs in which a linear segment of the protein-ligand is known to bind to the receptor protein, so the scenario can be modeled as a protein-peptide interface. In this scenario, conserved motifs from the ligand typically interact with defined pockets in the receptor binding site, although the entire ligand segment involved in the PPI may contain less conserved regions. A flowchart summarizing the major steps of the protocol is shown in Figure 1. The protocol starts with the 3D structure of the protein-protein complex and further reduces the ligand protein to the potential best-interacting segment, leaving the receptor protein intact. The best-interacting segment is inferred by using the BUDE Alanine Scan server15, which conducts computational alanine scanning mutagenesis to identify hot-spot residues between the two interacting proteins. In this approach, residues from the ligand are individually replaced by alanine, and the estimated change in free energy or stability of the complex (ΔΔG) is then used to infer the relevance of the corresponding residue for the target PPI. Once the best-interacting segment is inferred, its complex with the receptor protein is used as the base structure submitted to Pepspec to perform sequence diversification.

Figure 1
Figure 1: Overview of the main steps of the protocol proposed in this work. Numbers match step numbers in the protocol section. Figures were made with the protein-protein complex used as the example described in the text. In this complex, the protein chain considered as the receptor is shown in pink, while the chain considered as the ligand is shown in light blue with its predicted best-interacting segment highlighted in red. Please click here to view a larger version of this figure.

One of the limitations of the suggested protocol is the requirement for a resolved structure of the protein-peptide interface. The protocol may alternatively begin with a model of the target protein-peptide interface, although the specific modeling steps are not described herein. Moreover, although the protocol can be conducted on a personal computer running any operating system, a Linux environment is required for the steps involving the Rosetta applications. A computer cluster is also highly recommended for the sequence diversification step due to the large number of iterations typically performed by Pepspec.

Application of the suggested protocol is illustrated with the estimation of amino acid preferences of the biding surface of IRF5, a member of the human interferon regulatory factor (IRF) family. We chose this protein as an example because, during its activation, two subunits bind to form a dimer whose structure is well characterized16. In IRF dimers, binding can be modeled as a protein-peptide interface in which one subunit provides the binding surface and the other one interacts through a region containing a short conserved motif called pLxIS17,18. In addition, binding to IRF subunits is multispecific; therefore, they can form homodimers, heterodimers, and complexes with other cellular proteins known as coactivators18.

Protocol

1. Initial preparation of the protein-peptide interface Downloading the structure of the protein-protein complex Navigate to the Protein Data Bank (PDB) homepage (https://www.rcsb.org/) and type the PDB ID for the structure of the protein-protein complex in the main search box (Figure 2A). The PDB ID for the structure of the IRF5 dimer, used as an example in this work, is 3DSH19. In the main page for the desired…

Representative Results

In this article, we described a protocol to predict the amino acid preferences of the binding surface of IRF5, a member of a family of transcription factors known as human interferon regulatory factors. These proteins are regulators of innate and adaptive immune responses and participate in the differentiation and activation of several immune cells. IRF subunits have highly plastic and multispecific binding surfaces, being capable of forming homodimers, heterodimers, and complexes with other cellular proteins<sup class="…

Discussion

The present article describes a protocol to estimate the amino acid preferences of potentially multispecific binding sites based on in silico sequence diversification. Few computational tools have been developed to estimate amino acid preferences of protein-peptide interfaces14,25,26. These tools have a predictive nature, but they differ in the computational algorithms used to perform their predictions and the corrections they i…

Disclosures

The authors have nothing to disclose.

Acknowledgements

Financial support by Sistema Nacional de Investigación (SNI) (grant numbers SNI-043-2023 and SNI-170-2021), Secretaría Nacional de Ciencia, Tecnología e Innovación (SENACYT) of Panama and Instituto para la Formación y Aprovechamiento de Recursos Humanos (IFARHU) are gratefully acknowledged. Authors would like to thank Dr. Miguel Rodríguez for carefully reviewing the manuscript.

Materials

BUDE Alanine Scan Server University of Edinburgh https://pragmaticproteindesign.bio.ed.ac.uk/balas/ doi: 10.1021/acschembio.9b00560
Rosetta Modeling Software Rosetta Commons https://www.rosettacommons.org/software doi: 10.1002/prot.22851
UCSF Chimera University of California San Francisco https://www.cgl.ucsf.edu/chimera/ doi: 10.1002/jcc.20084

References

  1. Kim, P. M., Lu, L. J., Xia, Y., Gerstein, M. B. Relating three-dimensional structures to protein networks provides evolutionary insights. Science. 314 (5807), 1938-1941 (2006).
  2. Schreiber, G., Keating, A. E. Protein binding specificity versus promiscuity. Current Opinion in Structural Biology. 21 (1), 50-61 (2011).
  3. Erijman, A., Aizner, Y., Shifman, J. M. Multispecific recognition: Mechanism, evolution, and design. Biochemistry. 50 (5), 602-611 (2011).
  4. Fromer, M., Shifman, J. M. Tradeoff between stability and multispecificity in the design of promiscuous proteins. PLoS Computational Biology. 5 (12), e1000627 (2009).
  5. Xie, T., Zmyslowski, A. M., Zhang, Y., Radhakrishnan, I. Structural basis for multispecificity of MRG domains. Structure. 23 (6), 1049-1057 (2015).
  6. Hendler, A., et al. Human SIRT1 multispecificity is modulated by active-site vicinity substitutions during natural evolution. Molecular Biology and Evolution. 38 (2), 545-556 (2021).
  7. Teilum, K., Olsen, J. G., Kragelund, B. B. On the specificity of protein-protein interactions in the context of disorder. The Biochemical Journal. 478 (11), 2035-2050 (2021).
  8. Pelay-Gimeno, M., Glas, A., Koch, O., Grossmann, T. N. Structure-based design of inhibitors of protein-protein interactions: Mimicking peptide binding epitopes. Angewandte Chemie (International ed. in English). 54 (31), 8896-8927 (2015).
  9. Wang, Y., Xue, P., Cao, M., Yu, T., Lane, S. T., Zhao, H. Directed evolution: Methodologies and applications. Chemical Reviews. 121 (20), 12384-12444 (2021).
  10. Liu, J., Cropp, T. A. Rational protein sequence diversification by multi-codon scanning mutagenesis. Methods in Molecular Biology. 978, 217-228 (2013).
  11. Wei, H., Li, X. Deep mutational scanning: A versatile tool in systematically mapping genotypes to phenotypes. Frontiers in Genetics. 14, 1087267 (2023).
  12. Bratulic, S., Badran, A. H. Modern methods for laboratory diversification of biomolecules. Current Opinion in Chemical Biology. 41, 50-60 (2017).
  13. Humphris, E. L., Kortemme, T. Prediction of protein-protein interface sequence diversity using flexible backbone computational protein design. Structure. 16 (12), 1777-1788 (2008).
  14. King, C. A., Bradley, P. Structure-based prediction of protein-peptide specificity in Rosetta. Proteins. 78 (16), 3437-3449 (2010).
  15. Ibarra, A. A., et al. Predicting and experimentally validating hot-spot residues at protein-protein interfaces. ACS Chemical Biology. 14 (10), 2252-2263 (2019).
  16. Chen, W., Srinath, H., Lam, S. S., Schiffer, C. A., Royer, W. E., Lin, K. Contribution of Ser386 and Ser396 to activation of interferon regulatory factor 3. Journal of Molecular Biology. 379 (2), 251-260 (2008).
  17. Mancino, A., Natoli, G. Specificity and function of IRF family transcription factors: Insights from genomics. Journal of Interferon & Cytokine Research. 36 (7), 462-469 (2016).
  18. Schwanke, H., Stempel, M., Brinkmann, M. M. Of keeping and tipping the balance: Host regulation and viral modulation of IRF3-dependent IFNB1 expression. Viruses. 12 (7), 33 (2020).
  19. Chen, W., et al. Insights into interferon regulatory factor activation from the crystal structure of dimeric IRF5. Nature Structural & Molecular Biology. 15 (11), 1213-1220 (2008).
  20. Pettersen, E. F., et al. UCSF Chimera-A visualization system for exploratory research and analysis. Journal of Computational Chemistry. 25, 1605-1612 (2004).
  21. Crooks, G. E., Hon, G., Chandonia, J. -. M., Brenner, S. E. WebLogo: a sequence logo generator. Genome Research. 14 (6), 1188-1190 (2004).
  22. Panne, D., McWhirter, S. M., Maniatis, T., Harrison, S. C. Interferon regulatory factor 3 is regulated by a dual phosphorylation-dependent switch. The Journal of Biological Chemistry. 282 (31), 22816-22822 (2007).
  23. Weihrauch, D., et al. An IRF5 decoy peptide reduces myocardial inflammation and fibrosis and improves endothelial cell function in tight-skin mice. PloS One. 11 (4), e0151999 (2016).
  24. Mori, M., Yoneyama, M., Ito, T., Takahashi, K., Inagaki, F., Fujita, T. Identification of Ser-386 of interferon regulatory factor 3 as critical target for inducible phosphorylation that determines activation. The Journal of Biological Chemistry. 279 (11), 9698-9702 (2004).
  25. Smith, C. A., Kortemme, T. Predicting the tolerated sequences for proteins and protein interfaces using RosettaBackrub flexible backbone design. PloS One. 6 (7), e20451 (2011).
  26. Rubenstein, A. B., Pethe, M. A., Khare, S. D. MFPred: Rapid and accurate prediction of protein-peptide recognition multispecificity using self-consistent mean field theory. PLoS Computational Biology. 13 (6), e1005614 (2017).
This article has been published
Video Coming Soon
Keep me updated:

.

Cite This Article
Cruz, H., Llanes, A., Fernández, P. L. Computational Prediction of Amino Acid Preferences of Potentially Multispecific Peptide-Binding Domains Involved in Protein-Protein Interactions. J. Vis. Exp. (203), e66314, doi:10.3791/66314 (2024).

View Video