Research

Welcome to the website of the COSBI (Computational and Systems Biology) group.   Our research focuses on systems biology in the broad field of computational biology.

Biology is increasingly becoming data-driven and computational approaches reach all aspects of medicine – from understanding the disease to developing therapies. We are mostly interested in protein-protein interactions and we develop computational methods (such as using machine learning, or molecular docking,…) to predict protein interactions at genome scale.

Protein-protein interactions are at the center of inter- and  intra-cell communication and signaling. Many diseases such as cancer involve malfunctioning proteins which result in erroneous signaling. We focus on precision medicine, more specifically how proteins interact and how genomic variations and mutations rewires signaling and it relates to diseases, particularly  cancer.

We do:

  • Develop computational methods and approaches for large scale structural modeling of protein-protein interactions
  • Develop methods to find critical residues (energy hotspots) at the protein-protein interfaces
  • Integrate 3D structural data of protein-protein complexes in signaling pathways that play important roles in cancer and other diseases
  • We use  our protein–protein interactions prediction tool (PRISM) which is able to carry out accurate predictions on the proteome scale to construct the structural networks of signaling pathways
  • Provide maintain databases and computational services of the methods developed in the group to the community.

CURRENT PROJECTS

Predicting Protein-Protein Interfaces using Deep Learning

Computational and Experimental investigation of RAS homodimer ve heterodimer complexes and interactions

Fast algorithms for Protein Interface Alignments

Methods for  relating missense mutations to signalling pathways using structures

Relating genotypes to phenotypes in breast cancer metastasis using three dimensional protein-protein interaction networks

PRISM

Prediction of protein-protein interactions at the structural level on the proteome scale is important because it allows prediction of protein function, helps drug discovery and takes steps toward genome-wide structural systems biology. The new PRISM web server enables fast and accurate prediction of protein–protein interactions (PPIs). The method consists of two components: rigid-body structural comparisons of target proteins to known template protein-protein interfaces and flexible refinement using a docking energy function. The PRISM rationale follows our observation that globally different protein structures can interact via similar architectural motifs. PRISM predicts binding residues by using structural similarity and evolutionary conservation of putative binding residue ‘hot spots’. The predicted models are stored in its repository. Given two protein structures, PRISM will provide a structural model of their complex if a matching template interface is available. Users can download the complex structure, retrieve the interface residues and visualize the complex model. The new version uses the template set form PIFACE which contains 22604 interfaces.

HMI-PRED

Host-Microbe Interaction PREDictor (HMI-PRED) is a webserver for structural prediction of protein-protein interactions (PPIs) between human and any microbial species. The rationale behind HMI-PRED is that if the microbial protein has a patch of surface that is structurally similar to one face of the template interface and similar evolutionarily conserved “hot spots”, it can interact with the host protein on the complementary face

PIFACE

Improvements in experimental techniques increasingly provide structural data relating to protein protein interactions. Classification of structural details of protein-protein interactions can provide valuable insights for modeling and abstracting design principles. Here, we aim to cluster proteinprotein interactions by their interface structures, and to exploit these clusters to obtain and study shared and distinct protein binding sites. We find that there are 22604 unique interface structures in the PDB. These unique interfaces, which provide a rich resource of structural data of proteinprotein interactions, can be used for template-based docking. We test the specificity of these nonredundant unique interface structures by finding protein pairs which have multiple binding sites. We suggest that residues with more than 40% relative accessible surface area should be considered as surface residues in template-based docking studies. This comprehensive study of protein interface structures can serve as a resource for the community.

HOTREGION

Hot spots are energetically important residues at protein interfaces and they are not randomly distributed across the interface but rather clustered. These clustered hot spots form hot regions. Hot regions are important for the stability of protein complexes, as well as providing specificity to binding sites. We propose a database called HotRegion, which provides the hot region information of the interfaces by using predicted hot spot residues, and structural properties of these interface residues such as pair potentials of interface residues, accessible surface area (ASA) and relative ASA values of interface residues of both monomer and complex forms of proteins. Also, the 3D visualization of the interface and interactions among hot spot residues are provided.

HOTPOINT

The energy distribution along the protein-protein interface is not homogenous; certain residues contribute more to the binding free energy, called ‘hot spots’. Here, we present a web server, HotPoint, which predicts hot spots in protein interfaces using an empirical model. The empirical model incorporates a few simple rules consisting of occlusion from solvent and total knowledge-based pair potentials of residues. The prediction model is computationally efficient and achieves high accuracy of 70%. The input to the HotPoint server is a protein complex and two chain identifiers that form an interface. The server provides the hot spot prediction results, a table of residue properties and an interactive 3D visualization of the complex with hot spots highlighted. Results are also downloadable as text files. This web server can be used for analysis of any protein-protein interface which can be utilized by researchers working on binding sites characterization and rational design of small molecules for protein

HOTSPRINT

A database of computational hot spots in protein interfaces: HotSprint. Hot spots are residues comprising only a small fraction of interfaces yet accounting for the majority of the binding energy. HotSprint contains data for 35 776 protein interfaces among 49 512 protein interfaces extracted from the multi-chain structures in Protein Data Bank (PDB) as of February 2006. The conserved residues in interfaces with certain buried accessible solvent area (ASA) and complex ASA thresholds are flagged as computational hot spots. The predicted hot spots are observed to correlate with the experimental hot spots with an accuracy of 76%. Several machine-learning methods (SVM, Decision Trees and Decision Lists) are also applied to predict hot spots, results reveal that our empirical approach performs better than the others. A web interface for the HotSprint database allows users to browse and query the hot spots in protein interfaces.

SPRINT

A new database of computational hot spots in protein interfaces: HotSprint. Hot spots are residues comprising only a small fraction of interfaces yet accounting for the majority of the binding energy. HotSprint contains data for 35 776 protein interfaces among 49 512 protein interfaces extracted from the multi-chain structures in Protein Data Bank (PDB) as of February 2006. The conserved residues in interfaces with certain buried accessible solvent area (ASA) and complex ASA thresholds are flagged as computational hot spots. The predicted hot spots are observed to correlate with the experimental hot spots with an accuracy of 76%. Several machine-learning methods (SVM, Decision Trees and Decision Lists) are also applied to predict hot spots, results reveal that our empirical approach performs better than the others. A web interface for the HotSprint database allows users to browse and query the hot spots in protein interfaces