top of page

Collaborations

Protein Dynamics at the Interface of Genomics and Disease Phenotype

Perhaps one of the most successful initiatives from the Center for Structural Biology (CSB) at Vanderbilt was the establishment of the Personalized Structural Biology (PSB) program led initially by Jens Meiler. Focused on computational modeling of disease-linked mutations, particularly of the monogenic class, this program excited colleagues in the genomics/genetics space, particularly in the cancer center, and weaved a network of collaborations that led to funding opportunities and scholarly contributions. A sister program, Molecular Basis of Genetic Diseases (MBGD) led by Hassane Mchaourab with contributions from Jens Meiler, focused on the experimental side of the genotype/phenotype relationship by providing an experimental platform from the test tube to model organism.

​

Much of the efforts of both programs focused initially on answering the rather simple question of whether a mutation ‘stabilizing’ or ‘destabilizing’ primarily probing stability and function. Both programs’ long-term goal is to transition to tackle the more interesting and impactful question of how a mutation that does not unfold the protein shifts the energetic preference between intermediate on the landscape thereby affecting the kinetics and thermodynamics of function or interactions with other proteins. Favoring a conformation or suppressing the population or lifetime of another is likely to be more prevalent as a mechanism of disease. To target these mutations for therapeutic intervention entails a mechanistic understanding of the functional cycle and the associated dynamics.  

AI and Protein Dynamics in Drug Discovery

Vanderbilt has a strong reputation for industry-quality small molecule drug development driven largely by the Warren Center for Neuroscience Drug Discovery helmed by Craig Lindsley. The AI revolution has had a substantially smaller impact on CADD and cheminformatics than on protein structure prediction. Indeed, quantitative structure-activity relationship (QSAR) modeling, which forms the basis of modern ligand-based high-throughput virtual screening (vHTS), has not notably improved from approaches developed in the 2000’s – 2010’s. This discrepancy is the result of differences in data availability and the task-specificity of QSAR modeling. The laboratory of Jens Meiler has played an instrumental role in developing and applying these modern cheminformatics tools at Vanderbilt, and in cooperation with Craig Lindsley has a strong history of integrating computational and experimental approaches to discover new small molecule modulators of G-protein-coupled receptors (GPCRs) and other proteins.

​

The time is ripe to further extend Vanderbilt’s unique strengths in small molecule drug discovery by extending CADD services broadly to the campus community in an “in silico first” initiative. There are notable areas in which AI is leading to a new state-of-the-art in CADD. For example, protein-ligand docking is being revolutionized by AI trained with geometric deep learning procedures (e.g., EquBind, DiffDock), enabling the identification of likely binding modes of small molecules 1000x faster and with equivalent accuracy to conventional force field-based docking approaches. These new AI tools in combination with industry-standard conventional methods have the potential to accelerate research in multiple domains at Vanderbilt. Central to the success of the CSB and the PSB Program are outreach efforts to gain the insight and expertise of outside specialists to identify high impact areas of application. Supporting sustainable deployment of CADD efforts across campus will enhance the real-world impact of academic innovation.

​

There remain unsolved grand challenges in CADD that are ripe for reappraisal in the new AI era. Protein dynamics remain a major challenge in CADD. There are no CADD methods suitable for vHTS that can also account for largescale induced-fit changes. Essentially all modern methods, AI included, are limited to rigid backbone predictions with sidechain refinement. Induced-fit and conformational selection still require more expensive simulations that are fundamentally incompatible with screening more than a few hundred molecules in the most generous instances. AI systems incorporating reduced-MSA, RoseTTAFold, and additional algorithmic modifications could, for example, be developed to provide groundbreaking largescale induced-fit capabilities.

​

Another unsolved challenge in CADD is rapid binding affinity prediction. In the mid-to-late 2010’s there was much enthusiasm for training graph-based deep neural networks to predict binding affinities from protein-ligand complexes; however, such methods have since shown to be riddled with training set bias and ill-suited for prospective screening. Thus, the gold-standard binding affinity approaches are still physics-based estimates, such as free energy perturbation or potential of mean force methods, which are computationally prohibitive for vHTS and relegated to small scale lead optimization assistance. Despite these challenges, increasing the generalizability and quality of rapid AI-assisted binding affinity estimates is a tractable problem that could be addressed through experimental and computer-aided data augmentation coupled with debiasing the feature space of protein-ligand interactions, for example through feature engineering or message-passing neural networks.

Rational for an AI-Focused Center

There is little doubt that AI is poised to radically shape the future of biomedical research at multiple levels in most fields, including protein science. Repositioning the basic sciences, and one can argue the institution as a whole, is a pressing strategic need. We highlighted above few of the unique opportunities that exist at the interface of AI and protein dynamics with return on investment in deciphering genetic diseases and drug discovery. The Center will make strategic investment in computational infrastructure, efficiently pools resources, and catalyzes collaborations with scientists in the basic sciences and the university as a whole.

​

We foresee a model that would mimic an integrated group of faculty – more of a U01-style grant than a loose federation of R01 investigators. One of us (Hassane Mchaourab) had previously proposed, in the context of a university-wide AI initiative, a hub and spoke model that we think would work here – albeit at a smaller scale. The hub will consist of 5 faculty (other than Mchaourab) who will work on the development, adaptation, and implementation of AI technologies and MD approaches with applications in their own groups to problems closely linked to the mission of the Center. The spokes will consist of a cohort of expert research scientists and software engineers with multiple duties. They will provide coding expertise for the Center investigators, oversee a computational hardware core that would be established specifically for the Center, establish infrastructure for the sustainable deployment of technologies developed by the Center, and act as the interactive arm with investigators on campus, thus enabling the application of methods to specific biomedical research problems. These interactions will highlight areas of development that the hub could consider depending on impact, relevance to the Center, and intellectual bandwidth. Critically, the positions of these non-faculty staff scientists will offer an opportunity to attract experts to the Center who would otherwise be recruited to non-academic biotech industries.

​

The Center will have a training component that will involve students interested in computer/data sciences and biomedical sciences. The idea is to train the future workforce at the interface of these disciplines. The inclusion of expert non-faculty within this academic environment will not only increase the productivity of the Center but also enhance its effectiveness as a training environment for our graduate students. This concept will be implemented at a pilot level next summer through a collaboration between the MBGD and the Data Science Institute. We believe that such a training program will not only be fundable but in the context of the new center, will enhance the education and training missions of Vanderbilt University.  

​

In addition to its scholarly impact, the center will also position Vanderbilt to take advantage of the federal resources that will flow into AI and its application in biomedical sciences. The success of the center will spur investments in AI at the campus level. Vanderbilt is woefully behind national trends when it comes to computation in general and AI in particular.

bottom of page