The Human Resources Strategy for Researchers
Marie Skłodowska-Curie Actions

PhD position 13 – MSCA COFUND, AI4theSciences (PSL, France) - “Machine Learning for origin of life in the RNA world”

This job offer has expired

    Université PSL
    First Stage Researcher (R1)
    26/02/2021 23:00 - Europe/Brussels
    France › Paris
    H2020 / Marie Skłodowska-Curie Actions COFUND


“Artificial intelligence for the Sciences” (AI4theSciences) is an innovative, interdisciplinary and intersectoral PhD programme, led by Université Paris Sciences et Lettres (PSL) and co-funded by the European Commission. Supported by the European innovation and research programme Horizon 2020-Marie Sklodowska-Curie Actions, AI4theSciences is uniquely shaped to train a new generation of researchers at the highest academic level in their main discipline (Physics, Engineering, Biology, Human and Social Sciences) and master the latest technologies in Artificial Intelligence and Machine Learning which apply in their own field.

26 doctoral students will join the PSL university's doctoral schools in 2 academic cohorts to carry out work on subjects suggested and defined by PSL's scientific community. The 2020 call will offer up to 15 PhD positions on 24 PhD research projects. The candidates will be recruited through HR processes of high standard, based on transparency, equal opportunities and excellence.


Description of the PhD subject: “Machine Learning for origin of life in the RNA world”


Context - Motivation

This project will use unsupervised machine learning to analyze large RNA sequence datasets, then generate RNA sequences capable to make copies of themselves. The success rate of the generative model will be tested experimentally, and used to compute the first statistically informed estimate of the probability of self-reproduction during the origin of life.


Is the appearance of life an exceptional or a likely event? The most accepted origin of life scenario, called ‘the RNA world’, is that primordial self-reproducing genetic systems were made of RNA (ribonucleic acids) instead of DNA, RNA and proteins. However, we currently know only one RNA sequence capable to make copies of itself. This sequence is 200 nucleotides long which makes its appearance very implausible: 1 in 10120 in the space of random sequences of the same length, to be compared to 1080 atoms in the observable universe. This particular RNA, found in a bacterium called Azoarcus, has been re-engineered to assemble copies of itself from fragments. Such construct provides an experimental proof-of-principle that self-reproduction is possible in the RNA world. However, it does not address the question of the plausibility of self-reproduction. Indeed, large RNA datasets have not yet been exploited to infer the property of self-reproduction and it is not possible either to compute the function of RNAs from first principles.


Scientific Objectives, Methodology & Expected results

We will leverage the diversity of large families of RNAs found in organisms in order to generate primordial RNA candidates capable of self-reproduction. We will rely on the Group I Intron Sequence Database which comprises ~17.000 RNA sequences that possessthe same basic catalytic functions as the Azoarcus RNA and can in principle be engineered for self-reproduction. This database will be used as an input for a machine learning technique called Direct Coupling Analysis (DCA), developed by the co-supervisor Martin Weigt. DCA extracts statistical signatures of conservedco-variations in families of evolutionary related sequences. Because DCA is computationally efficient, it allows analysis of large families and was successfully applied to generate artificial protein, showing that artificial sequences with as much as 70% sequence divergence were functional. In this project, we will apply DCA to RNA instead of proteins, with the aim of generating artificial sequences being statistically indistinguishable from the natural RNAs.The resulting generative model will be tested experimentally for self-reproduction in the laboratory of Philippe Nghe. They will be combined with the folding prediction algorithm developed by  Matteo Smerlak. The experimentally measured success rate will determine a lower bound on the probability of self-reproduction, which has not been done so far. In turn, the experimental data from artificially generated sequences that are found functional will expand the dataset and serve as an input to refine the DCA analysis and structure prediction. Each iteration between machine learning and experiment will thus improve the estimate of the probability of self-reproduction in the RNA world and the predictability of RNA structure and function from its sequence.

Analyzing large sequence datasets is the only tool available at the moment to inferfunctional properties of biomolecules at the scale of the sequence space.The main reasons for this are:

  1. The sequence space is so huge (10120 in our case) that global questions such as ‘how common is molecular self-reproduction?’ can only be answered in a statistical manner. This excludes traditional approaches in biochemistry which focus on characterizing well one or a few molecules.
  2. Even if restricting to one given RNA sequence, its number of structural configurations is also astronomically large and optimization highly non-convex, so that activity prediction has resisted > 40 years of developments based on molecular simulations. In particular, existing RNA folding algorithms cannot even correctly predict the Azoarcus RNA structure. Therefore, other inference techniques are needed.
  3. Large datasets of functional sequences are now available thanks to high throughput DNA sequencing, either from organisms or mutants generated in the laboratory. Statistical patterns in these functional sequences provide enormous information on their structure and activity, but extracting such information requires modern machine learning tools such as DCA, which are efficient enough to decipher correlations between several thousands of sequences.


The origin of life is one of the big fundamental questions. For instance, it has been elected as the number 1 question science should address in a consultative poll of the general public in the Netherlands, giving rise to a large nationwide funding scheme in 2019 (oLife initiative). However, because the origin of life is perceived as an essentially curiosity-driven question, its scientific treatment has mostly remained theoretical and speculative. Consequently, the origin of life is often used by non-scientific forms of explanation to question the validity of the scientific method itself. It is therefore crucial to provide evidence based on data, now that such data is available, experiments are possible, and machine learning can extract meaningful information from this data. Furthermore, this project will advance our understanding of the general relationship between sequences and function in RNAs, which has applications beyond the specific RNA family studied here, notably in biomedical sciences and synthetic biology as it allows programmable systems that can be implemented in vitro and in vivo for diagnosis and drug delivery.


International mobility

The PhD candidate is expected to travel at least yearly for an extended stay (1-3 months/year) at the Max Planck Institute for Mathematics in the Sciences (Leipzig, Germany), hosted by Matteo Smerlak, to test and combine the DCA technique with RNA folding algorithms. He or she will also interact and have the possibility to visit the collaborators of the HFSP consortium: The Ramesh lab (National Center for Biological Sciences, Bangalore, India) who determines RNA structures experimentally, and the Hayden lab (Boise State University, Boise, USA) doing experimental RNA directed evolution.


Thesis supervision

Philippe Nghe and Martin Weigt



Created in 2012, Université PSL is aiming at developing interdisciplinary training programmes and science projects of excellence within its members. Its 140 laboratories and 2,900 researchers carry out high-level disciplinary research, both fundamental and applied, fostering a strong interdisciplinary approach. The scope of Université PSL covers all areas of knowledge and creation (Sciences, Humanities and Social Science, Engineering, the Arts). Its eleven component schools gather 17,000 students and have won more than 200 ERC. PSL has been ranked 36th in the 2020 Shanghai ranking (ARWU).

More Information


  • Opportunity to conduct academic research in a top 100 university in the world.
  • High-quality doctoral training rewarded by a PhD degree, prepared within ESPCI - PSL and delivered by PSL.
  • Access to cutting-edge infrastructures for research & innovation.
  • Appointment for a period of 36 months (job contract delivered by the involved component school of PSL) based on a salary of 3100 € gross employer (including employer tax) per month or approximately a 2228 € gross salary per month.
  • Job contract under the French labour legislation in force, respecting health and safety, and social security: 35 hours per week contract, 25 days of annual leave per year (“congés annuels”). Eventual complementary activities may be accepted or proposed by the co-supervisors (maximum of 64h/year for teaching, 32 day/year for specific missions).
  • Short stay(s) or secondment in France or abroad are expected.
  • An international environment supported by the adherence to the European Charter & Code.
  • Access to AI training package, with a strong interdisciplinary focus, together with a Career development Plan.

Eligibility criteria

  • Applicants must have a Master’s degree (or be in the process of obtaining one) or have a University degree equivalent to a European Master’s (5-year duration) to be eligible at the time of the deadline of the relative call.
  • There is no nationality or age criteria, but applicants must not have resided or carried out their main activity (work, studies, etc.) in France for more than 12 months in the 3 years immediately before the deadline of the call (MSCA Mobility rule).
  • Applicants must declare to be available to start the programme on schedule.

For submitting your online application, go to: https://www.psl.eu/recherche/grands-projets-de-recherche/projets-europee...


The online application should contain the following documents:

  • English translated transcripts from the Master’s degree (or equivalent 5-year degree). A copy of the Master’s degree or a certificate of achievement will be required later on for the final registration.
  • International curriculum vitae and a cover letter explaining the reasons that lead him/her to prepare a PhD, why he/she applies to this offer and his/her professional project (guidelines will be given to the applicants in order to help him/her in the writing of his/her letter).
  • Two academic reference letters.
  • A statement duly signed on the mobility rules, availability, and conflicts of interest.

The applicants can only apply to one PhD project among the available ones. Multiple applications of one candidate will automatically make all his/her applications ineligible.

Selection process

The applications will be analysed by the Management Team for eligibility and completeness. Afterwards, the applications will be reviewed by the Selection Committee. In the pre-selection round (March-April 2021), applicants will be rated using a scoring system based on 3 criteria (academic excellence, experience, motivation, and qualities). A shortlist of qualified applicants will be interviewed during the selection round (June 2021) to further assess their qualifications and skills according to the predefined selection criteria.

All information regarding the applications (criteria, composition of the Selection Committee, requirements) can be found on the website of the programme, in greater detail.


The selection and recruitment processes of the PhD student will be in accordance with the European Charter for Researchers and Code of Conduct of the Recruitment of Researchers. The recruitment process will be open, transparent, impartial, equitable, and merit based. There will be no discrimination based on race, gender, sexual orientation, religion of belief, disability, or age.

Additional comments

The research unit Chemistry Biology Innovation (CBI) headed by Jérôme Bibette gathers 5 teams: the laboratory of BioChemistry (LBC), the laboratory of Colloids and Divided Materials (LCMD), the laboratory of Evolutionary Genetics (LGE), the laboratory of Sciences of Analytics, Bioanalytics and Miniaturisation (LSABM) and the laboratory of Innovative Materials for Energy (MIE). The unit is highly interdisciplinary, from soft matter physics to chemistry, biochemistry and evolutionary biology, aiming for fundamental knowledge and technological innovation, which led to numerous spin-offs and start-ups (Raindance Technologies, Capsum, Biomillenia, HifiBio, ...).The project will be hosted in the Laboratory of BioChemistry in the team of Philippe Nghe. The team studies the origin of life and development of biological complexity, combining experimental and theoretical approaches, notably using RNA.


ESPCI - PSL is a leading French “Grande Ecole” founded in 1882, part of Université PSL, educating undergraduate and graduate students through a programme merging basic science and engineering, as well as a world-renowned research institution. ESPCI - PSL has setup a tradition of excellence in research, with distinguished faculty that have contributed to its history, such as Pierre and Marie Curie, Paul Langevin, Frédéric Joliot-Curie, Pierre-Gilles de Gennes and Georges Charpak. The five Nobel laureates in this list are emblematic of the exceptional ethos embodied in the permanent culture of excellence at ESPCI Paris.

ESPCI - PSL hosts 11 research units, all associated to CNRS and/or INSERM and/or other Parisian Universities in the form of joined research units, covering the fields of physics, chemistry and biology. Favouring interdisciplinary and operating at the frontiers between fundamental research and innovation, are two major objectives of ESPCI - PSL. This is achieved through a flexible organisation (without departments) that ensures a cross fertilization between scientific disciplines, as well as a direct connection between basic science and applications. One of ESPCI - PSL’s distinctive features is that it carries out fundamental research into areas of major interest to industry, while developing various approaches to practical industrial problems through the deep, fundamental understanding of the mechanisms at play. Performing fundamental research while keeping an eye on applications enables ESPCI - PSL research scientists to make an impact at multiple levels.

Scientists at ESPCI - PSL publish more than one scientific paper a day, and at the same time apply for one patent a week and create several technology-driven start-ups every year - over the last 10 years.

Web site for additional job details

Required Research Experiences

    1 - 4

Offer Requirements

    Chemistry: Master Degree or equivalent
    ENGLISH: Excellent


The proposed project is highly interdisciplinary and international. Candidates should ideally have a background in one or several of the following disciplines:bioinformatics, computersciences, data science, applied mathematics, biochemistry, biophysics,evolutionary biology.


In any case, the candidate is expected to learn novel techniques from the different advisors and laboratories, and interface experiments and computational approaches.


Therefore, team work, adaptability and a strong motivation for interdisciplinary and multicultural environments are a must.

Work location(s)
1 position(s) available at
Chemistry Biology Innovation Institute, ESPCI - PSL
10, rue Vauquelin

EURAXESS offer ID: 579866


The responsibility for the jobs published on this website, including the job description, lies entirely with the publishing institutions. The application is handled uniquely by the employer, who is also fully responsible for the recruitment and selection processes.


Please contact support@euraxess.org if you wish to download all jobs in XML.