RESEARCH FIELDChemistry › Biochemistry
RESEARCHER PROFILEFirst Stage Researcher (R1)
APPLICATION DEADLINE26/02/2021 23:00 - Europe/Brussels
LOCATIONFrance › Paris
TYPE OF CONTRACTTemporary
HOURS PER WEEK35
OFFER STARTING DATE01/09/2021
EU RESEARCH FRAMEWORK PROGRAMMEH2020 / Marie Skłodowska-Curie Actions COFUND
MARIE CURIE GRANT AGREEMENT NUMBER945304
“Artificial intelligence for the Sciences” (AI4theSciences) is an innovative, interdisciplinary and intersectoral PhD programme, led by Université Paris Sciences et Lettres (PSL) and co-funded by the European Commission. Supported by the European innovation and research programme Horizon 2020-Marie Sklodowska-Curie Actions, AI4theSciences is uniquely shaped to train a new generation of researchers at the highest academic level in their main discipline (Physics, Engineering, Biology, Human and Social Sciences) and master the latest technologies in Artificial Intelligence and Machine Learning which apply in their own field.
26 doctoral students will join the PSL university's doctoral schools in 2 academic cohorts to carry out work on subjects suggested and defined by PSL's scientific community. The 2020 call will offer up to 15 PhD positions on 24 PhD research projects. The candidates will be recruited through HR processes of high standard, based on transparency, equal opportunities and excellence.
Description of the PhD subject: “Artificial Intelligence to Decode the Genomic Replication Programme of Human Cells”
Context - Motivation
DNA replication is at the heart of genome stability and transmission of genetic and epigenetic information. Genome replication kinetics is influenced in a complex manner by transcriptional activity, chromatin structure, 3D folding, nuclear architecture and many other positive and negative factors. We have developed several cell-population and single-molecule biochemical methods and mathematical models to understand how the spatiotemporal programme of human DNA replication emerges from the stochastic activation of replication origins and progression of replication forks in normal and cancer cells of different developmental origin.
Within this context, we will use Artificial Intelligence for two distinct aims:
- decode the raw electrical signal obtained by nanopore sequencing of DNA replication intermediates, labeled with nucleoside analogs during replication, to reveal the full extent of cell-to-cell variation in replication origin usage and fork progression;
- use neural networks to extract from experimental genomic replication profiles the landscape of origin activation probability along the genome, and to assess how this probability is encoded in genetic and/or epigenetic features of different eukaryotic cell types. Ultimately, these results will uncover how the diffusion of replication factors within different, complex epigenetic chromatin conformations, determines the fundamental parameters of DNA replication kinetics and their changes during tumour progression, during organismal development and cell differentiation, and during evolution.
Genome duplication (or DNA replication) is a fundamental process of living beings that allows the faithful transmission of genetic information through cell divisions. In order to proliferate, any cell must copy its genome and distribute a copy to each daughter cell. Many antibacterial, antiviral or anticancer drugs act by blocking this process. On the other hand, copying errors can lead to cancer or genetic diseases. Finally biotechnologies and gene therapy require to control the replication of artificially introduced DNA into cells. Thus, the fundamental, medical, biotechnological and societal implications of DNA replication are wide-ranging. Replication takes place at "replication forks" where the two DNA strands are separated then copied. In eukaryotic organisms, strand separation starts at many points in the genome termed "replication origins". Local strand separation produces a pair of replication forks that progress in opposite directions. Forks emitted by neighbouring origins then converge and meet at "termination" sites. The mechanisms that regulate the number, location and activation time of replication origins, and the rate of replication fork progression, which together constitute the spatiotemporal programme of DNA replication, are some of the most challenging questions of modern molecular biology.
Molecular mechanisms of DNA replication
In the yeast S. cerevisiae, replication origins are encoded by specific DNA sequences that provide binding sites for the Origin Recognition Complex (ORC). During the G1 phase of the cell cycle, ORC loads the MCM2-7 complex, the core motor of the replicative helicase, in an inactive double hexameric (DH) form around double-stranded DNA, in preparation for DNA replication. MCM-DHs are then activated in S phase to initiate bidirectional DNA unwinding and recruit DNA polymerases and accessory factors that together establish two divergent DNA replication machineries (replisomes) at each pair of replication forks. Importantly, more MCM DHs are loaded onto chromatin in G1 than are activated in S phase. MCMs that fail to activate are displaced from chromatin during passive replication. Origin activation factors are in limiting amounts, suggesting that origin activation time is regulated by the efficiency with which origins compete with each other for these factors. Epigenetic modifications, genome topology and nuclear architecture can all influence recruitment of limiting factors. A single-molecule, optical analysis of yeast chromosome VI replication found that no two chromosomal copies in a clonal population replicate in an identical manner and first suggested that origins fire stochastically and independently of each other. This was later confirmed by mathematical analysis of genome-wide, ensemble replication profiles of high spatial (1 kb) and temporal (5 min) resolution.
Although ORC and MCM and other replication factors are conserved throughout eukaryotes, no specific sequences that specify replication origins could be identified in higher eukaryotes. Instead, many studies have revealed that the spatiotemporal programme of DNA replication is cell-type specific and regulated by epigenetic modifications, rather than by precise DNA sequence motifs. Epigenetic regulation allows more developmental flexibility than a rigid mechanism based on strict DNA sequence requirements. Single-molecule and ensemble replication profiling experiments have revealed considerable cell-to-cell heterogeneity, although non-random patterns clearly emerge in population averages. An important question subject to current debate is whether adjacent origins are activated independently or in a correlated manner. Genome duplication not only entails faithful DNA replication but also reassembly of fully chromatinized sister chromatids. Nucleosomes and other protein-DNA complexes are disrupted by replication fork traversal. However, epigenetic modifications that ensure cell-type specific gene expression programs must be rapidly reconstituted in the wake of replication forks if a dividing cell is to produce two identical daughter cells. On the other hand, chromatin disruption at the fork may also provide an opportunity to assemble alternative chromatin structures during cellular differentiation, and this may depend on the time and/or direction of fork passage. Thus, chromatin structure and replication can potentially influence each other in a complex manner, a topic of current intense investigation.
Scientific Objectives, Methodology & Expected results
Methodologies to map genomic replication
To address some of the above questions, we have developed several cell-population and single-molecule methods to purify and sequence at high throughput replicating DNA molecules labeled with nucleotide analogs during S phase (the cell cycle period when DNA is replicated). Repli-seq allows to identify the genomic position of newly replicated DNA at different stages of S phase in cultured cell populations and therefore to establish mean replication timing (MRT) profiles. OK-seq is a complementary method allowing to measure the proportion of rightward - (R) and leftward (L) - moving forks that replicate any locus in a cell population. The resulting replication fork directionality (RFD = R-L) profiles allow to deduce the position of predominant initiation and termination zones in the population. Both Repli-seq and OK-seq results imply a strong cell-to-cell heterogeneity in replication patterns, although stable statistical tendencies do emerge in cell populations. We have now profiled a dozen of different human cell lines by these methods. We defined genomic regions that replicate in either a cell-type specific or a constitutive manner, and the next challenge is to relate these data to the distinct epigenetic profiles of these multiple cell lines in order to understand the rules governing origin location and activation time.
We recently developed FORK-seq, a nanopore sequencing method to map replication of single DNA molecules at 200 bp resolution. During nanopore sequencing, a native, single DNA strand is translocated base-by-base through a membrane-inserted nanopore through which an ionic current flows. The current is influenced by the 5 bases present in the narrowest portion of the nanopore, so that each base is read in the context of 5 successive pentamers. Decoding the sequence of currents into a sequence of bases is therefore a non-trivial task typically solved with neural networks. Recent developments include the discrimination of naturally modified bases, such as 5-methyl cytosine from cytosine, and detection of artificially modified bases incorporated during replication. We thus used a convolutional neural network to discriminate thymidine from its analog bromodeoxyuridine (BrdU), following pulse-chased incorporation of BrdU along DNA replication intermediates in S. cerevisiae. We were thus able to detect and orient BrdU incorporation tracts, allowing us to reproduce ensemble RFD profiles and to rediscover ab initio the known origins and termini of this organism. Furthermore, we discovered dispersive initiation and termination events undetectable by population methods. FORK-seq therefore gives access to the full extent of cell-to-cell heterogeneity in DNA replication patterns. We now intend to further use this method to precisely measure fork progression, molecule by molecule, genome-wide in S. cerevisiae and to extend this approach to human cells. Some of these objectives may require to detect consecutive incorporation of multiple thymidine analogs and not just BrdU.
Need for Artificial Intelligence and Big Data Processing
Artificial Intelligence and big data processing are needed for two purposes. The first one is to decode the raw electrical nanopore sequencing signal into a base sequence including the discrimination between thymidine (T) and its nucleoside analogs bromo-, iodo-and chloro-deoxyuridine (BrdU, IdU, CldU). In addition, this could be combined to the discrimination of unmethylated and methylated cytosine, an important epigenetic modification of the DNA itself. Such a tool may be used to understand how cytosine methylation is propagated or not following replication fork passage.
We have so far developed a method that can monitor BrdU incorporation with a resolution of 200 base pairs and we estimate that the resolution can still be improved by a factor of ten. Decoding an electrical current into a DNA sequence is very similar in nature to decoding an audio recording into its text content, or more generally to any speech-related task, which neural networks are achieving with increasing accuracy. As a first task (T1.1), we propose to apply the latest development in neural network technology for speech-related tasks such as LSTM or transformer architectures to include attention mechanisms on a larger part of the reads, in order to increase the resolution of our method and to discriminate more that one DNA modification. While including larger parts of the reads to increase resolution may seem contradictory, electrical current coming form nanopore experiments are subject to large global shift and scaling that interfere with the decoding process.
Performing the well-controlled biological experiments needed to gather training sets of high quality is an important bottleneck. We thus plan (task T1.2) a new, distinct contribution of AI to predict, from the chemical structure of a base or its nucleoside analogs, the expected electrical current during nanopore sequencing. If successful, this will allow a dramatic decrease of the number of experiments needed to train a neural network to perform detection of nucleoside to monitor DNA replication or for other applications. This ambitious goal will be pursued following the successful use of graph neural networks to predict diverse chemical properties from chemical structures.
The second purpose of using AI is to use neural networks to extract quantitative information about the DNA replication program from multiple, big experimental datasets. The initial problem (task T2.1) consists in recovering the landscape of replication initiation probability along the genome from the experimental MRT and RFD profiles. To do so, neural networks are trained on replication profiles simulated from synthetic initiation landscapes. Once this 'inverse' problem is solved, one can ask whether this initiation probability is epigenetically encoded using available epigenetic datasets (histone post-transcriptional modifications and/or transcriptional activity and factor binding and/or 3D genome folding). The problem is not trivial because current research suggest that multiple epigenetic modifications synergize in some unknown combinatorial manner to promote replication initiation. Therefore, 1D epigenetic profiles will be combined using AI models to predict the initiation density landscape (task T2.2). The capacity for such a model to predict the initiation landscape is not necessarily sufficient to fully extract new biological knowledge i.e. to decipher the rule governing the specifation of the replication landscape by epigenetic marks. The project will thus focus on explainable AI techniques at this stage.
The project will clearly require mobility between the Paris and Lyon ENS - PSL partner labs. International mobility will be considered if our future findings are of interest to our current international collaborators. The congress communications and publications of OK-seq and FORK-seq have attracted to us a large number of collaboration demands, some of which have been published and some of which are ongoing. Similar questions are approached by many high-level laboratories throughout the world, and we have developed several recent or ongoing international collaborations to address them (A. Schepers, Munich; A. Nussenzweig, Bethesda; N. Barkai, Tel-Aviv).
Olivier Hyrien and Benjamin Audit
Created in 2012, Université PSL is aiming at developing interdisciplinary training programmes and science projects of excellence within its members. Its 140 laboratories and 2,900 researchers carry out high-level disciplinary research, both fundamental and applied, fostering a strong interdisciplinary approach. The scope of Université PSL covers all areas of knowledge and creation (Sciences, Humanities and Social Science, Engineering, the Arts). Its eleven component schools gather 17,000 students and have won more than 200 ERC. PSL has been ranked 36th in the 2020 Shanghai ranking (ARWU).
- Opportunity to conduct academic research in a top 100 university in the world.
- High-quality doctoral training rewarded by a PhD degree, prepared within Ecole Normale Supérieure - PSL and delivered by PSL.
- Access to cutting-edge infrastructures for research & innovation.
- Appointment for a period of 36 months (job contract delivered by the involved component school of PSL) based on a salary of 3100 € gross employer (including employer tax) per month or approximately a 2228 € gross salary per month.
- Job contract under the French labour legislation in force, respecting health and safety, and social security: 35 hours per week contract, 25 days of annual leave per year (“congés annuels”). Eventual complementary activities may be accepted or proposed by the co-supervisors (maximum of 64h/year for teaching, 32 day/year for specific missions).
- Short stay(s) or secondment in France or abroad are expected.
- An international environment supported by the adherence to the European Charter & Code.
- Access to AI training package, with a strong interdisciplinary focus, together with a Career development Plan.
- Applicants must have a Master’s degree (or be in the process of obtaining one) or have a University degree equivalent to a European Master’s (5-year duration) to be eligible at the time of the deadline of the relative call.
- There is no nationality or age criteria, but applicants must not have resided or carried out their main activity (work, studies, etc.) in France for more than 12 months in the 3 years immediately before the deadline of the call (MSCA Mobility rule).
- Applicants must declare to be available to start the programme on schedule.
For submitting your online application, go to: https://www.psl.eu/recherche/grands-projets-de-recherche/projets-europee...
The online application should contain the following documents:
- English translated transcripts from the Master’s degree (or equivalent 5-year degree). A copy of the Master’s degree or a certificate of achievement will be required later on for the final registration.
- International curriculum vitae and a cover letter explaining the reasons that lead him/her to prepare a PhD, why he/she applies to this offer and his/her professional project (guidelines will be given to the applicants in order to help him/her in the writing of his/her letter).
- Two academic reference letters.
- A statement duly signed on the mobility rules, availability, and conflicts of interest.
The applicants can only apply to one PhD project among the available ones. Multiple applications of one candidate will automatically make all his/her applications ineligible.
The applications will be analysed by the Management Team for eligibility and completeness. Afterwards, the applications will be reviewed by the Selection Committee. In the pre-selection round (March-April 2021), applicants will be rated using a scoring system based on 3 criteria (academic excellence, experience, motivation, and qualities). A shortlist of qualified applicants will be interviewed during the selection round (June 2021) to further assess their qualifications and skills according to the predefined selection criteria.
All information regarding the applications (criteria, composition of the Selection Committee, requirements) can be found on the website of the programme, in greater detail.
The selection and recruitment processes of the PhD student will be in accordance with the European Charter for Researchers and Code of Conduct of the Recruitment of Researchers. The recruitment process will be open, transparent, impartial, equitable, and merit based. There will be no discrimination based on race, gender, sexual orientation, religion of belief, disability, or age.
The Institute of Biology of the Ecole Normale Supérieure (IBENS), is an internationally recognized center for fundamental biological research, triply affiliated to the National Center for Scientific Research (CNRS), the National Health and Medical Research Institute (Inserm), and the Ecole Normale Supérieure (ENS - PSL), a top-ranking higher education establishment belonging to Université PSL.
The IBENS hosts over 300 people grouped into 30 autonomous teams leading a highly collaborative research on the mechanisms and principles that rule living systems, combining experimental and theoretical approaches with a strong translational potential, and technological innovation (patents, hosting of start-ups). Scientific research at IBENS covers four major areas (Genetics and Genomics, Cellular Biology and Development, Neuroscience, Ecology and Evolution) and relies on five technological platforms (Genomics, Proteomics, Imaging, Computational Biology, FabLab) with centralized administrative and IT support. Fostering a strong interdisciplinarity amongst its researchers, the IBENS carries a holistic vision of life by integrating its different levels of complexity -from unique molecules and cells, to networks and organs, to entire organisms and populations interacting with their environment.
The Ecole Normale Supérieure - PSL is a leading multidisciplinary institution that focuses on training through research. The ENS - PSL defines and applies scientific and technological research policies, from a multidisciplinary and international perspective and counts close relationships with prestigious partners, in France and abroad. It encompasses fourteen teaching and research departments, spanning the main humanities, sciences, and disciplines. The ENS - PSL currently has a staff of almost 800 lecturers, ENS - PSL, CNRS or associated researchers, and post-doc researchers. Within its Departments, the ENS - PSL includes 40 research units identified as ENS - PSL, INSERM or INRIA, encompassing ENS - PSL and CNRS agents as well as 300 foreign researchers and 650 doctoral students. The ENS - PSL respects the principles of the European Charter and Code for Researchers and is engaged in the HRS4R certification.
Required Research Experiences
YEARS OF RESEARCH EXPERIENCE1 - 4
REQUIRED EDUCATION LEVELMathematics: Master Degree or equivalent
REQUIRED LANGUAGESENGLISH: Excellent
- Mathematics, Physics or Bioinformatics Master's degree.
- Strong interest in machine learning and if possible in neural networks.
- A first experience in applied machine learning from a Master 1 or Master 2 internship and/or a proven curiosity for biology will be assets.
EURAXESS offer ID: 580427
The responsibility for the jobs published on this website, including the job description, lies entirely with the publishing institutions. The application is handled uniquely by the employer, who is also fully responsible for the recruitment and selection processes.
Please contact firstname.lastname@example.org if you wish to download all jobs in XML.