Dr. Carlos Peña-Reyes

Professor, HEIG-VD


Leveraging the Power of Machine Learning on Genomics: Its Role in the Development of Precision Therapies against Multi-Resistant Bacteria


The emergence and rapid dissemination of antibiotic resistance worldwide threatens medical progress. A promising alternative to fight against multi-resistant bacteria is to use their natural predators: bacteriophages, viruses that infect and kill bacteria with the advantage of having low impact on the human bacterial flora, as they are highly strain specific. This latter fact constitutes a serious limitation for rapid therapy development as for each bacteria one must find the corresponding bacteriophage. Faced with the need to systematically examine a multitude of possible interactions, the rapid development of bacteriophages as an alternative to antibiotics can only be done with the help of a model to predict the interactions between bacteria and bacteriophages. We are conceiving, implementing, and investigating an original approach, based on supervised machine-learning, intended to predict if a given pair of phage-bacterium would interact using only their genomic information. As a first step, we extract annotated phage-bacterium pairs from public databases such as NCBI and PhageDB. In addition, other pairs are provided directly by our partner at UNIL, who sequences the organisms and performs in-vitro tests. Up to now, we have created a database with more than 20’000 interactions and 6’000 different organisms. As a second step, we extract meaningful and informative features from the genomes, a crucial task, knowing that some of the predictive models are trained on the resulting information. We created more than 20 datasets using two different kinds of features: domain-based scoring of protein-protein interactions and statistics from the protein primary structures. As a third step, to build our predictive models, we explore several, radically-different, and complementary approaches: (1) Ensemble learning, which combines several different supervised machine-learning models through a voting method; (2) One-class learning, based solely on positive (i.e., interacting) pairs, as they are the most often reported in the literature, and (3) a deep-learning approach intended to use mainly the organism’s whole genomes. The current results, that show high predictive power (ranging from 80 to 90%), encourage us to continue exploring these approaches. Finally, we will include in our AI system a selection of the best models that will be used to produce, for instance, a list of candidate therapeutic phages able to infect a given bacterium. These phages might then be tested in laboratory to confirm their efficacy.


Full professor in computer engineering at the University of Applied Sciences Western Switzerland in Yverdon (HEIG-VD). His research team, Computational Intelligence for Computational Biology (CI4CB), is affiliated to the Swiss Institute of Bioinformatics (SIB). His research explores the interfaces between life and computational sciences, including mostly bio-inspired computational intelligence on the one side and bioinformatics and computational biology on the other.