The Human Resources Strategy for Researchers
Marie Skłodowska-Curie Actions

PhD position 12 – MSCA COFUND, AI4theSciences (PSL, France) - “Data-driven Enzyme Evolution”

This job offer has expired

    Université PSL
    First Stage Researcher (R1)
    26/02/2021 23:00 - Europe/Brussels
    France › Paris
    H2020 / Marie Skłodowska-Curie Actions COFUND


“Artificial intelligence for the Sciences” (AI4theSciences) is an innovative, interdisciplinary and intersectoral PhD programme, led by Université Paris Sciences et Lettres (PSL) and co-funded by the European Commission. Supported by the European innovation and research programme Horizon 2020-Marie Sklodowska-Curie Actions, AI4theSciences is uniquely shaped to train a new generation of researchers at the highest academic level in their main discipline (Physics, Engineering, Biology, Human and Social Sciences) and master the latest technologies in Artificial Intelligence and Machine Learning which apply in their own field.

26 doctoral students will join the PSL university's doctoral schools in 2 academic cohorts to carry out work on subjects suggested and defined by PSL's scientific community. The 2020 call will offer up to 15 PhD positions on 24 PhD research projects. The candidates will be recruited through HR processes of high standard, based on transparency, equal opportunities and excellence.


Description of the PhD subject: “Data-driven Enzyme Evolution”


Context - Motivation

Understanding current enzymes and designing new ones are major problems in biology and biotechnology. Currently, the most successful approach to design enzymes with novel properties is directed evolution, a methodology that mimics in the lab the process of evolution by mutation and selection. The field has recently undergone major technological advances with the development of high-throughput emulsion methodologies which allow for the screening of very large populations of variants. Yet, the capabilities of directed evolution protocols are still limited, which is in part due to an inefficient use of the information generated at each generation. Recent high-throughput sequencing techniques allow monitoring of the full variant population over selection rounds, and therefore opens the possibility to achieve coupled in vitro/in silico evolution. Here we propose to turn directed evolution into a quantitative approach by combining the latest experimental technologies with advanced analysis of large-scale sequencing data. At a fundamental level, we wish to explore the determinants of function in enzymes. Application-wise, this integration of theory with experiments will enable a more powerful use of directed evolution to enzyme design, with potential applications in the agro and pharma industries.


Current implementations of directed evolution for enzyme design are subject to experimental and conceptual limitations:

  1. Selection for catalytic activity is done indirectly, in most cases by (over)expressing the enzyme in a living organism, which gives rise to toxicity effects and limits the throughput
  2. Implementing more than a few cycles of mutations and selection is tedious if not impossible, limiting the exploration range to a few mutation away from the starting point.
  3. The diversification process at  each generation is often limited to random mutations, disregarding -or poorly exploiting- previously accumulated knowledge.

The project proposes to overcome two of these limitations (1) and (2), in the context of a new experimental approach recently demonstrated in the Gulliver Lab at ESPCI - PSL, and to combine with AI to overcome (3). To build ‘smart’  libraries and optimally guide the population of catalyst in the fitness landscape, the main innovation of the project is to exploit large-scale data that we will produce by high-throughput sequencing. Inspired by previous analyses of protein sequence datasets, developed by O. Rivoire and collaborators, and by several recent applications of machine learning to biomolecules, we will infer from the sequencing data mathematical models for the relation between the sequence of an enzyme, its catalytic activity and its selective advantage. Using these models, we will be able to (i) recursively design improved libraries based on the output of previous experiments and (ii) eliminate confounding factors when estimating enzymatic activity. More generally, the models will incorporate experimental parameters, which will permit the optimization of the experimental workflow.


Scientific Objectives, Methodology & Expected results

Experimental system

Our experimental platform is a coupled expression-replication system based on the phage Phi29 replication mechanism, reconstituted in an in vitro protein expression system (PURE system).Starting from a wild-type enzyme of interest, this platform bypasses screening and can apply a functional selection pressure (selecting for catalytic activity of the encoded proteins) on populations of genetic variants at ultra-high throughput, up to 10 mutants per round. Next generation sequencing, which provides similar throughput, can then be performed to reveal the composition of the population of variants prior and after selection. A simple analysis of this data reveals which variants are enriched and therefore the most fit. More interestingly, statistical  models of the relation between sequence and fitness inferred from this data can predict sequences that are not present in the initial dataset but have high fitness. These sequences can then be synthesized to make a new data-informed population of variants that serves as input to subsequent experiments of selection.


Data analysis and modeling

Models will be developed to infer catalytic activities from selection/sequencing data, to design new libraries and to optimize the experiment. A computational framework has already been developed in the team of Olivier Rivoire in the context of antibody evolution and machine learning approaches are being actively explored by a number of groups to integrate genetic and proteomic data. We will start from relatively simple models inspired by statistical mechanics (Potts models) that have already demonstrated their relevance for describing the relation between the sequence and function of proteins. We will then elaborate them to incorporate features of more complex neural network models (e.g.CNN, LSTM) that are still largely under-exploited in the context of proteins, despite the promises that they hold.

Models inspired by statistical mechanics, which corresponds to a fully connected neural network with no hidden units have indeed been demonstrated to be effective at describing sequence-to-function relationships in natural proteins or directed evolution. We will then extend our models to include hidden units, including deep neural networks, to determine which models are most appropriate in our context and datasets. All the tested models will be generative, i.e., they permit the design of new sequences with similar or improved properties to the training set. We will use this feature to validate the approach by computationally proposing new sequences to be tested individually, or included in the new-generation libraries. By iteratively feeding the model with experimental data and the experiments with theoretical predictions, we will form a virtuous cycle that constitutes the main strength of this proposal.



Data will be generated with the proposed experimental system. We envision that the first year of PhD will consist in taking control of the experimental platform and applying existing Potts models, the second year in analyzing more elaborate statistical models for improved prediction and validation of the results, while the third year will be dedicated to applying the developed method to a particular challenging problem of enzymatic evolution, for instance converting the specificity of an enzyme of industrial interest.


International mobility

The team of the co-supervisor is currently collaborating with the group of Rama Ranganathan at University of Chicago (USA) with the goal of developing new experimental and theoretical approaches to understand proteins through evolution. The two teams are the recent recipient of a grant from the FACCTS program to promote collaborations between PSL and the University of Chicago (project entitled "Evolution-Based Engineering of Protein Function"). The recruited student will be able to benefit from this program and perform part of his project in Chicago.


Thesis supervision

Yannick Rondelez and Olivier Rivoire



Created in 2012, Université PSL is aiming at developing interdisciplinary training programmes and science projects of excellence within its members. Its 140 laboratories and 2,900 researchers carry out high-level disciplinary research, both fundamental and applied, fostering a strong interdisciplinary approach. The scope of Université PSL covers all areas of knowledge and creation (Sciences, Humanities and Social Science, Engineering, the Arts). Its eleven component schools gather 17,000 students and have won more than 200 ERC. PSL has been ranked 36th in the 2020 Shanghai ranking (ARWU).

More Information


  • Opportunity to conduct academic research in a top 100 university in the world.
  • High-quality doctoral training rewarded by a PhD degree, prepared within ESPCI - PSL and delivered by PSL.
  • Access to cutting-edge infrastructures for research & innovation.
  • Appointment for a period of 36 months (job contract delivered by the involved component school of PSL) based on a salary of 3100 € gross employer (including employer tax) per month or approximately a 2228 € gross salary per month.
  • Job contract under the French labour legislation in force, respecting health and safety, and social security: 35 hours per week contract, 25 days of annual leave per year (“congés annuels”). Eventual complementary activities may be accepted or proposed by the co-supervisors (maximum of 64h/year for teaching, 32 day/year for specific missions).
  • Short stay(s) or secondment in France or abroad are expected.
  • An international environment supported by the adherence to the European Charter & Code.
  • Access to AI training package, with a strong interdisciplinary focus, together with a Career development Plan.

Eligibility criteria

  • Applicants must have a Master’s degree (or be in the process of obtaining one) or have a University degree equivalent to a European Master’s (5-year duration) to be eligible at the time of the deadline of the relative call.
  • There is no nationality or age criteria, but applicants must not have resided or carried out their main activity (work, studies, etc.) in France for more than 12 months in the 3 years immediately before the deadline of the call (MSCA Mobility rule).
  • Applicants must declare to be available to start the programme on schedule.

For submitting your online application, go to: https://www.psl.eu/recherche/grands-projets-de-recherche/projets-europee...


The online application should contain the following documents:

  • English translated transcripts from the Master’s degree (or equivalent 5-year degree). A copy of the Master’s degree or a certificate of achievement will be required later on for the final registration.
  • International curriculum vitae and a cover letter explaining the reasons that lead him/her to prepare a PhD, why he/she applies to this offer and his/her professional project (guidelines will be given to the applicants in order to help him/her in the writing of his/her letter).
  • Two academic reference letters.
  • A statement duly signed on the mobility rules, availability, and conflicts of interest.

The applicants can only apply to one PhD project among the available ones. Multiple applications of one candidate will automatically make all his/her applications ineligible.

Selection process

The applications will be analysed by the Management Team for eligibility and completeness. Afterwards, the applications will be reviewed by the Selection Committee. In the pre-selection round (March-April 2021), applicants will be rated using a scoring system based on 3 criteria (academic excellence, experience, motivation, and qualities). A shortlist of qualified applicants will be interviewed during the selection round (June 2021) to further assess their qualifications and skills according to the predefined selection criteria.

All information regarding the applications (criteria, composition of the Selection Committee, requirements) can be found on the website of the programme, in greater detail.


The selection and recruitment processes of the PhD student will be in accordance with the European Charter for Researchers and Code of Conduct of the Recruitment of Researchers. The recruitment process will be open, transparent, impartial, equitable, and merit based. There will be no discrimination based on race, gender, sexual orientation, religion of belief, disability, or age.

Additional comments

The Gulliver Lab at ESPCI - PSL gathers theorists and experimentalists working at the interface of Physics, Chemistry, Biology and Computer science. Its research focuses on soft matter (colloids, polymers, liquid crystals, granular matter, thin sheets...), active matter (self-propelled colloids, swimming droplets, walking grains, swarms of robots...) and molecular systems (DNA, RNA, enzymes...).


ESPCI - PSL is a leading French “Grande Ecole” founded in 1882, part of Université PSL, educating undergraduate and graduate students through a programme merging basic science and engineering, as well as a world-renowned research institution. ESPCI - PSL has setup a tradition of excellence in research, with distinguished faculty that have contributed to its history, such as Pierre and Marie Curie, Paul Langevin, Frédéric Joliot-Curie, Pierre-Gilles de Gennes and Georges Charpak. The five Nobel laureates in this list are emblematic of the exceptional ethos embodied in the permanent culture of excellence at ESPCI Paris.

ESPCI - PSL hosts 11 research units, all associated to CNRS and/or INSERM and/or other Parisian Universities in the form of joined research units, covering the fields of physics, chemistry and biology. Favouring interdisciplinary and operating at the frontiers between fundamental research and innovation, are two major objectives of ESPCI - PSL. This is achieved through a flexible organisation (without departments) that ensures a cross fertilization between scientific disciplines, as well as a direct connection between basic science and applications. One of ESPCI - PSL’s distinctive features is that it carries out fundamental research into areas of major interest to industry, while developing various approaches to practical industrial problems through the deep, fundamental understanding of the mechanisms at play. Performing fundamental research while keeping an eye on applications enables ESPCI - PSL research scientists to make an impact at multiple levels.


Scientists at ESPCI - PSL publish more than one scientific paper a day, and at the same time apply for one patent a week and create several technology-driven start-ups every year - over the last 10 years. ESPCI - PSL has a long experience in the management of European projects. With a large place for collective guidance as part of the management inside the consortium, the organisation will provide a strong administrative support to such projects, with dedicated staff on financial, legal, and administrative issues.

Web site for additional job details

Required Research Experiences

    1 - 4

Offer Requirements

    Chemistry: Master Degree or equivalent
    ENGLISH: Excellent


We expect a candidate with:

  •  A desire to perform both experimental and theoretical research.
  • A background in Mathematics, Physics, Chemistry, Machine Learning and/or Biochemistry.
  • A kin interest in interdisciplinary applications to Biology and Chemistry.

Work location(s)
1 position(s) available at
Gulliver Lab, ESPCI - PSL
10, rue Vauquelin

EURAXESS offer ID: 579333


The responsibility for the jobs published on this website, including the job description, lies entirely with the publishing institutions. The application is handled uniquely by the employer, who is also fully responsible for the recruitment and selection processes.


Please contact support@euraxess.org if you wish to download all jobs in XML.