From quarks to drugs: a journey in technology transfer

Pietro Faccioli

1 Theoretical physics in the age of quantitative Life Science

Over the last century, physicists have been continuously involved in developing and testing approximation methods and numerical techniques to solve for the structure and the dynamics of specific interacting systems. Some of these powerful schemes have then been successfully applied to other areas of physics. For example, approximations conceived to model the dynamics of strongly interacting protons and neutrons have then been exported to investigate the motion of strongly correlated electrons in solids and become standard tools of theoretical chemistry.

On the other hand, for the largest part of the last century, the examples of cross-fertilization across different disciplines of science have been relatively scarce. In particular, the theoretical physicist’s reductionistic approach was often regarded by life scientists as unfit to cope with the intrinsic complexity of living organisms. However, during the last decade, cross-disciplinary research is finaly emerging as the new paradigm to enable the transition towards a fully quantitative and predictive Life Science. Many of the world leading laboratories in this area now integrate scientists with background ranging from chemistry to physics, mathematics and computer science. At the same time, multi-disciplinary curricula and academic programs are being offered in many universities, worldwide. This ongoing revolution has fostered the development of innovative diagnostic methods, triggered the conception of new therapeutic strategies, and even given rise to entirely new concepts, such as that of precision medicine.

In the uprising of quantitative and computational approaches to Life Science, the theoretical physicist’s training in mathematical modeling and computing is becoming an important asset, in recognition of the fact that biological macromolecules are prototypical examples of complex many-body physical systems.

In this article, I will discuss how some mathematical methods originally developed to describe tunneling phenomena in subnuclear systems have been adapted and successfully applied to investigate the structural dynamics of biological macromolecules, and in particular proteins. This cross-disciplinary research enabled us to simulate for the first time the folding of fairly large, biologically relevant proteins, using realistic models with atomic resolution. Some unexpected phenomena that were revealed by these simulations led us to conceive a completely new paradigm for drug discovery, which is now being industrially pursued and may help to tackle pathologies that are currently considered untreatable.

2 The protein folding process

Proteins are the molecular “nano-machines” designed by evolution to support life. They come in different size and species and carry out very different functions. For example, they can be enzymes, anti-bodies, they can store oxygen, or being involved in cell replication or death. Since proteins play such a key role in living organisms, they are inevitably involved in all pathologies. Indeed, most of pharmacological research aims at finding small molecules that can interfere with the biological function of specific target proteins.

Proteins are produced by the ribosome, a macromolecular complex that translates the genetic instruction encoded in the messenger RNA into a specific sequence of 20 different types of amino-acids, linked together to form a linear chain. Immediately after its synthesis, this so-called polypeptide chain begins to fold onto itself, rapidly collapsing into a molten globular state. There, it begins a long search for the unique conformation (called the native state) where it finally becomes structurally stable and can carry out its biological function.

As famously demonstrated by Anfisen in 1957, the structure of a protein’s native state is uniquely determined by its sequence of amino-acids. Therefore, protein folding is the process through which the genetic information is ultimately translated into structural information and, consequently, into biological function. Predicting the structure of a protein native state given its amino-acid sequence is often referred to as the first part of the protein folding problem. The second part of the problem corresponds to predicting the folding pathways knowing the final destination, i.e. the native structure.

Using X-ray scattering or Nuclear Magnetic Resonance (NMR) it is now possible to reconstruct very accurately the threedimensional structure of protein native states with an atomic level of resolution. However, no experimental technique is currently available that can reconstruct the dynamics of protein folding pathways with a comparable degree of resolution.

Physics-based computer simulations have in principle the potential to provide a powerful virtual microscope, with extremely high spatiotemporal resolution. Unfortunately, the standard algorithm used to predict the structural dynamics of macromolecules – molecular dynamics (MD) – is fundamentally limited when it comes to protein folding simulations. This is because this process is a prototypical thermally activated rare event: a poly-petide chain must perform a huge number of “random failed folding attempts” before it finally finds its way to the native state. Therefore, in a MD simulation most of the computational time is wasted to generate unsuccessful attempts, and computational resources are typically exhausted before even a single productive folding event can be observed. This limitation led many groups to develop more advanced algorithms, which are collectively referred to as enhanced sampling methods. These typically introduce additional assumptions or approximations, or require to provide in input some prior information about the system under consideration.

In particular, our group reformulated the problem of computing protein folding pathways using Feynman’s path integral formalism and introduced approximations that were originally invented to describe quantum tunneling transitions in subnuclear systems. This new way of framing the protein folding problem paved the way to devise new algorithms to accelerate the simulation of rare thermally activated molecular transitions. These improvements enabled us for the first time to simulate folding of biologically relevant proteins, using realistic physics-based models. As an illustrative example, in fig. 1 we schematically represent a folding pathway taken by protein Angiotensin-converting enzyme 2 (ACE2) a receptor of the SARS-CoV-2 virus. Even resorting to the most powerful existing special purpose supercomputer, observing a similar trajectory in an MD simulation would require hundreds of thousands of years of non-stop computing.

It should be emphasized that the computational advantage of our algorithms does not come for free: indeed, to apply them, one must give up the ambition to solve the first part of the protein folding problem, i.e. to predict the native structure given the amino-acid sequence. However, our algorithms can very effectively solve the second part of the problem, i.e. accurately predict the patterns of structural rearrangement through which a given poly-peptide chain reaches its given native structure. Fortunately, by now, hundreds of thousands of protein native structures have already been resolved using X-ray scattering or Nuclear Magnetic Resonance experiments, and this number keeps growing very rapidly. In addition, recent development in bioinformatics applications of Artificial Intelligence yielded very accurate predictions of protein structures, thus promising to enlarge the pool of proteins which can be simulated with our algorithms.

3 A new approach to drug discovery

Most of conventional drug discovery is based on identifying small molecules that can interfere with a protein’s biological function, by binding to pockets that are present in its native structure.

Unfortunately, in many cases, the target protein is “undruggable” by this technique. For example, this may occur when the native state does not display usable pockets in the vicinity of its functional site. In addition, the conventional drug discovery approach invariably fails in the so-called misfolding pathologies. In these cases, the disease is correlated with the presence of incorrectly folded protein structures, which tend to form insoluble toxic aggregates.

Equipped with our powerful virtual microscope, a few years ago our group embarked in multi-disciplinary collaborations, aiming at unveiling different biophysical aspects of protein folding and exploring possible translational implications. In particular, a series of investigations performed in collaboration with the molecular biology laboratory led by Prof. E. Biasini at Trento University revealed that almost all biologically relevant proteins fold by visiting a rather well-conserved sequence of partially folded meta-stable states, called the folding intermediates. This finding contrasted with the folding mechanism of very small proteins, which usually occurs though a single-step co-operative transition. At that time, the idea that the folding of biologically relevant proteins could be far more complex than that of mini-proteins had been already suggested, but such folding intermediates were never systematically predicted nor characterized at an atomistic level of detail.

This finding immediately raised the question whether it is possible to leverage on the knowledge of folding intermediates to find new ways of inhibiting protein targets. Our answer is a method called Pharmacological Protein Inactivation by Folding Intermediate Targeting (PPI- FIT). The basic idea behind this scheme (see fig. 2) is to look for small molecules that can bind to pockets that are only present in transient folding intermediates, i.e. before the chain reaches its native state. This way the protein cannot proceed along the folding pathway and remains stuck in its intermediate. Cells are equipped with a quality control machinery that enables them to efficiently recognize and eliminate partially or incorrectly folded proteins. Therefore, by stopping the folding process, it is possible to induce the cell degradation of the target protein. This approach can be applied to virtually any target, but is particularly useful for proteins that are undruggable with the conventional method.

The PPI-FIT approach was first validated on the human cellular prion protein (hereby denoted as PrPc). This polypeptide chain represents the substrate of infective agents called prions, which consist of toxic aggregates of misfolded prion proteins, denoted as PrPsc. Prions are responsible for several invariably lethal neuro-degenerative diseases, including the infamous mad cow disease. Like all misfolding pathologies, prion-related diseases are untreatable with the conventional method. On the other hand, by reducing the concentration of PrPc in cells, it is in principle possible to hamper the growth and subsequent accumulation of toxic aggregates of PrPsc, thus effectively hindering the prion infection.

Applying the PPI-FIT technology, we discovered several small molecules that have been demonstrated to effectively reduce the expression levels of PrPc proteins in cells. Figure 3 illustrates the effectiveness of a specific small molecule called SM875, which was discovered using PPI-FT. Using a biochemical method called Western Blotting it is possible to determine the abundance of a given protein in cells. In this technique, the darkness of the grey spots is correlated with the amount of PrPc present in the cell. This figure clearly shows how the cellular concentration of PrPc decreases with an increasing cellular concentration of SM875.

The industrial application of the PPI-FT technology is now being pursued by Sibylla Biotech S.R.L., a research spinoff of the Universities of Trento and Perugia and of the Italian Institute for Nuclear Physics (INFN). Sibylla Biotech has already discovered a number of active small molecules in different therapeutic areas.

In July 2021, Sibylla Biotech was selected among the 8 world finalists of the prestigious Nature Spinoff Prize.

4 Anti-COVID-19 research

After the burst of the COVID-19 world pandemic, Sibylla Biotech and INFN have joined forces with Trento and Perugia Universities to use the PPI-FIT scheme to look for an antiviral drug. In particular, the project aimed at lowering the expression levels of protein ACE2, a 600 amino-acid protein recognized by the SARS-CoV2 virus in the early stage of the COVID-19 infection. The rationale behind this endeavor is that reducing the cellular expression levels of this protein would hamper the ability of the COVID-19 virus to penetrate the cell membrane and propagate the infection.

However, ACE2 was by far the largest protein for which an atomistic folding simulation was ever attempted. INFN made available its huge computing infrastructures, that enabled scientists at Sibylla Biotech to generate several folding events.

Then, by applying the PPI-FIT approach, Sibylla’s researchers discovered that a few molecules that are already approved drugs for different therapeutic purposes or in an advanced stage of clinical trial were predicted to bind to a folding intermediate of ACE2, thus reducing its expression levels. Later experiments performed on cells confirmed this prediction.

It remained to be proved that lowering the expression levels of ACE2 produces an anti-viral effect. Experiments performed on different cellular colonies infected with COVID-19 showed that one of these drugs called Artefenomel is particularly effective in hindering the viral propagation. It should be emphasized that this is just the first step in a long way to develop antiviral drug for COVID-19. The effectiveness of this drug candidates on complex organisms and in particular humans remains to be investigated.

5 Sky is the new frontier: The ZePrion experiment

Drug discovery is a long and extremely expensive process. After a small molecule is shown to be active on a given target protein, its chemical structure must be optimized in order to increase its effectiveness, reduce its toxicity, and improve the availability in cell. This refinement procedure, which is called the hit-to-lead phase of drug development, is best performed by taking into consideration the structural and chemical information about the molecule-target interaction. In other words, the optimization of the drug candidate requires a very accurate knowledge of its binding pose and its local chemical environment.

In conventional rational drug discovery, the target is the protein native structure. Druggable pockets in this state can be identified by analyzing the three-dimensional structure which is available by X-ray crystallography or NMR. Hence, in the hit-to-lead phase it is possible to rely on extremely accurate structural and chemical information.

Conversely, in the PPI-FIT approach, the target is a protein folding intermediate, i.e. a transient meta-stable state. As such, its structure cannot be determined experimentally and the drug development phase must rely entirely on the structural information provided by the protein folding simulations. In principle, since the folding intermediate state can be stabilized by the interaction with the drug candidate, one may hope that X-ray experiments could still provide reliable structures of the protein-drug complex. Unfortunately, in practice, these experiments are extremely challenging, because when proteins are stuck in folding intermediates they develop a large propensity to aggregate and precipitate. As a result, it is very difficult to grow an ordered crystal of targets with the desired properties that are required to perform X-ray experiments.

A number of recent studies have highlighted striking advantages of performing protein crystallization in microgravity conditions. This is mostly due to the absence of convective motion, which on Earth can interfere with the formation of ordered crystals. This observation led us to conceive and design the ZePrion experiment, which will be conducted in the International Space Station (ISS), in 2022 (see fig. 4). In this experiment, the free-fall conditions that are realized in orbit will be exploited to attempt the crystallization of the PrPc protein, in complex with SM875, the small molecule discovered using the PPI-FIT approach. The protein will be first melted and then let refold in the presence of the small molecule, which should then stop the folding process by binding to its folding intermediate. The experimental apparatus consists in a miniaturized molecular biology laboratory that can be operated from Earth and was developed by the Israeli company Space Pharma. This apparatus will be carried to the ISS during the RAKIA Space Mission, using a SpaceX reusable vector. After the experiment, the samples will be taken back to Earth and structurally characterized. If this analysis confirms the formation of a good crystal, the sample will be sent to a synchrotron laboratory to be analyzed with X-rays. Should this ambitious experiment be successful, it will open an entirely new avenue for space-based pharmacological research.

6 Conclusions

This 15 year long research journey has provided some very tangible examples of how blue-sky research can actually breed innovation. Indeed, without the efforts that led our fellow theoretical physicists to develop sophisticated mathematical methods to understand the jiggling of quarks in a proton, none of the algorithms we used to study the jiggling of a protein in water could be conceived.

In addition, this and other projects carried out by other groups are demonstrating that physics-based modeling of macromolecular dynamics has finally reached a mature and productive phase. We foresee that in the following years molecular models and simulations will play an increasingly important role in the race towards a fully predictive, quantitative and multidisciplinary Life Science.