RESEARCH FIELDBiological sciences › Biodiversity
RESEARCHER PROFILEFirst Stage Researcher (R1)
APPLICATION DEADLINE26/02/2021 23:00 - Europe/Brussels
LOCATIONFrance › Paris
TYPE OF CONTRACTTemporary
HOURS PER WEEK35
OFFER STARTING DATE01/09/2021
EU RESEARCH FRAMEWORK PROGRAMMEH2020 / Marie Skłodowska-Curie Actions COFUND
MARIE CURIE GRANT AGREEMENT NUMBER945304
“Artificial intelligence for the Sciences” (AI4theSciences) is an innovative, interdisciplinary and intersectoral PhD programme, led by Université Paris Sciences et Lettres (PSL) and co-funded by the European Commission. Supported by the European innovation and research programme Horizon 2020-Marie Sklodowska-Curie Actions, AI4theSciences is uniquely shaped to train a new generation of researchers at the highest academic level in their main discipline (Physics, Engineering, Biology, Human and Social Sciences) and master the latest technologies in Artificial Intelligence and Machine Learning which apply in their own field.
26 doctoral students will join the PSL university's doctoral schools in 2 academic cohorts to carry out work on subjects suggested and defined by PSL's scientific community. The 2020 call will offer up to 15 PhD positions on 24 PhD research projects. The candidates will be recruited through HR processes of high standard, based on transparency, equal opportunities and excellence.
Description of the PhD subject: “Creating AI/ML techniques to enhance mechanistic eco-evolutionary computer simulations”
Context - Motivation
Understanding how biological diversity has evolved over geological time scales, as well as the various processes influencing its evolution over shorter time scales and assembly into ecological communities, is key for predicting the impact of current and future environmental changes on biodiversity, and the associated human, social and economic impact. One approach to addressing this question is to develop general mechanistic eco-evolutionary computer models similar to global circulation models that describe the climate system. By simulating the diversification of life with these models,and comparing those simulationsto empirical data (e.g. patterns of species richness, genetic diversity, or abundances, but also range distributions, phenotypic characteristics, phylogenetic trees and fossil occurrences), we can select the model(s) that best match observed data and infer the parameters of these models.The goal of this research agenda is both to understand the essential eco-evolutionary processes underlying biodiversity dynamics and to build causal predictive models. Researchers have recently started to build such general mechanistic eco-evolutionary computer simulations (thereafter referred to as Global Dynamic Biodiversity Models, or GDBMs). These models could be coupled to comprehensive biodiversity datasets that are currently being collected at an ever-increasing pace, generating massive data. However, a key missing component is how to compare and calibrate GDBMs to this massive data. The current project aims at developing efficient Artificial Intelligence (AI) / Machine Learning (ML) techniques that will allow a proper adjustment of complex eco-evolutionary models to massive eco-evolutionary data.This will allow us to process these data in a much more comprehensive, mechanistic, and powerful way than is currently possible.
The need to understand and model the processes that shape biodiversity on Earth is pressing, yet the complexity of the ecological, evolutionary and spatial processesinvolved rendersthis task particularly challenging. Several computer models already exist that modelbiodiversity dynamics based on general ecological and evolutionary principles, and that generate, through simulations, predictions for patterns that can be observed in nature. Of those, simpler model structures have been adjusted using statistical approaches such as maximum likelihood or (approximate) Bayesian methods, but applying these methods to more complex mechanistic eco-evolutionary simulations is mathematically and computationally challenging. Thus, there is clearly a need to develop an inference machinery that would couple the new class of large mechanistic simulations with the available big biodiversity data. Such a machinery should be likelihood-free, flexible, fast, accurate and scalable. Machine learning, in particular deep learning, provides the appropriate framework for such a development.
Scientific Objectives, Methodology & Expected results
We will base upon the recent GENeral Engine for Eco-Evolutionary SImulationS on the origins of biodiversity (gen3sys) model, which is a spatially-explicit simulator that generates a series of useful biodiversity patterns (Hagen et al. 2020). Gen3sys is already available as aR Package, and the associated paper is in revisions for PloS Biology. Gen3sys models evolution at the population level, an intermediate scale between individual-based models, which are too computationally intensive to be simulated for long periods of time, and lineage-based models, which don’t account for demographic or other intraspecific effects, and don’t generate important features of ecological systems such as species abundances and genetic diversity. In the ‘full’ gen3sys model, populations are characterized by trait values which determine their responses to abiotic and biotic factors. The model simulates eco-evolutionary dynamics via four core processes: speciation, dispersal, trait evolution, and changes in abundances. Speciation occurs when subpopulations have been spatially isolated for a long enough period (the ‘duration’ of speciation). By calibrating the parameters of the core processes (speciation duration, dispersal capacity, rate of trait evolution, and demographic parameters) to data, we can test evolutionary theory and turn the model into a predictive simulation model. Currently however, there is no statistical machinery that could perform such a calibration in a reasonable computational time, which limits the applicability of gen3sys. The general objective of the project is to develop an AI-based inference machinery for gen3sys.
Machine learning inference
We will train and optimize machine learning methods to infer the correct parameter values from given biodiversity patterns (phylogeny, species distributions etc.). We will consider two approaches:
- The first involves a substantial dimension reduction of the feature space by making informed guess about summary statisticsof the biodiversity data that are informative for the inversion of the models. This will allow us to train standard deep neural networks (DNNs) or other ML methods such as gradient-boosted regression trees to perform the parameter inversion.
- The second approach is more ambitious, and will use the uncompressed, and thus high-dimensional biodiversity data directly as features for the inversion. In this case, we will build a model from convolutional neural networks (CNNs) and possibly other DNNs for the inversion. We hypothesize that this approach might be able to achieve a higher performance by learning which patterns in the biodiversity data are informative for the inference of model parameters. We will also use xAI methods to examine what patterns in the data are used by the CNNs to infer the parameters. A technical challenge for using CNNs is that while most biodiversity data can be represented by arrays with meaningful neighborhood relationships,some data (e.g. phylogenies, interaction networks) are better represented by graphs. To solve this problem, we will explore the use of graph CNNs. A fallback is to find grid-type representations of the graph data.
For all models, we will establish appropriate procedures for validation and tuning of model hyperparameters and general pre-processing / feature engineering steps, to make the results as general and reliable as possible.
To illustrate the utility of our approach, we will focus on the evolution of mammals over the Cenozoic (i.e. the last 65 Myrs). Of any species group, mammals have the most complete data across many dimensions, such as geographic distributions (from the Global Biodiversity Information Facility GBIF) and the International Union for Conservation of Nature IUCN), traits, genetic data, phylogenies and fossil occurrences (from the Paleobiology Database, PBDB). Mammals diversified mostly after the last mass extinction event ~65 Myrs, and the geographic configuration of the continents through time, which is an input of gen3sys, is well known over the period. We will also benefit from the paleoenvironmental reconstructions of topography, aridity and temperature from Hagen et al. 2020.
By adjusting several versions of gen3sys of increasing complexity to the mammalian data, we will
- gain knowledge on mammalian past history
- calibrate the model such that it can be simulated forward under various projected scenarios of environmental change to predict the response of mammalian communities.
The PhD thesis will be subdivided into three chapters corresponding to three versions of gen3sys of increasing computational complexity. Starting with a study of relatively low complexity will allow us togain experience and experiment with the general pipeline, data requirements for the AI algorithms, and their performance in inferring parameters from data.
- Objective 1: We will develop the AI-based inference procedure for a simplified version of gen3sys where populations are not characterized by explicit trait values that influence their response to abiotic and biotic factors. This model (gen3sys_geo) will thus be a spatially explicit model of range evolution and geographic speciation. While species won’t be characterized by explicit trait values, they will be characterized by species-specific core parameters (duration of speciation, dispersal capacities, and demographic rates). In the spirit of a recent model we developed, these species-specific parameters will be inherited at speciation, i.e. drawn from a distribution centered on the parental value.
- Objective 2: We will then develop the AI inference for a ‘trait-based’ version of gen3sys (gen3sys_traits) where populations are characterized by explicit evolving trait values that can influence their core parameters.Here again species-specific rates of trait evolution will be inherited at speciation.
- Objective 3: Finally, we will develop the AI inference for the complete gen3sys model (gen3sys_full), that includes a ‘niche-based’ version of gen3sys where trait values determine how abiotic factors modulate changes in abundances, and a ‘competitive’ version of gen3sys where trait values determine how interaction with other co-existing species modulate changes in abundance.
The PhD student will be based at the IBENS, in Paris (France). He/she will spend 1 ½ month at the ETH in Zurich (Switzerland) at the beginning of the project, to get familiar with gen3sys. He/she will also spend 1 ½ month at the University of Regensburg (Germany) for each of the three IA inference developments.
Hélène Morlon and Florian Hartig
Created in 2012, Université PSL is aiming at developing interdisciplinary training programmes and science projects of excellence within its members. Its 140 laboratories and 2,900 researchers carry out high-level disciplinary research, both fundamental and applied, fostering a strong interdisciplinary approach. The scope of Université PSL covers all areas of knowledge and creation (Sciences, Humanities and Social Science, Engineering, the Arts). Its eleven component schools gather 17,000 students and have won more than 200 ERC. PSL has been ranked 36th in the 2020 Shanghai ranking (ARWU).
- PhD project, subject to the availability of funding -
- Opportunity to conduct academic research in a top 100 university in the world.
- High-quality doctoral training rewarded by a PhD degree, prepared within Ecole Normale Supérieure - PSL and delivered by PSL.
- Access to cutting-edge infrastructures for research & innovation.
- Appointment for a period of 36 months (job contract delivered by the involved component school of PSL) based on a salary of 3100 € gross employer (including employer tax) per month or approximately a 2228 € gross salary per month.
- Job contract under the French labour legislation in force, respecting health and safety, and social security: 35 hours per week contract, 25 days of annual leave per year (“congés annuels”). Eventual complementary activities may be accepted or proposed by the co-supervisors (maximum of 64h/year for teaching, 32 day/year for specific missions).
- Short stay(s) or secondment in France or abroad are expected.
- An international environment supported by the adherence to the European Charter & Code.
- Access to AI training package, with a strong interdisciplinary focus, together with a Career development Plan.
- Applicants must have a Master’s degree (or be in the process of obtaining one) or have a University degree equivalent to a European Master’s (5-year duration) to be eligible at the time of the deadline of the relative call.
- There is no nationality or age criteria, but applicants must not have resided or carried out their main activity (work, studies, etc.) in France for more than 12 months in the 3 years immediately before the deadline of the call (MSCA Mobility rule).
- Applicants must declare to be available to start the programme on schedule.
For submitting your online application, go to: https://www.psl.eu/recherche/grands-projets-de-recherche/projets-europee...
The online application should contain the following documents:
- English translated transcripts from the Master’s degree (or equivalent 5-year degree). A copy of the Master’s degree or a certificate of achievement will be required later on for the final registration.
- International curriculum vitae and a cover letter explaining the reasons that lead him/her to prepare a PhD, why he/she applies to this offer and his/her professional project (guidelines will be given to the applicants in order to help him/her in the writing of his/her letter).
- Two academic reference letters.
- A statement duly signed on the mobility rules, availability, and conflicts of interest.
The applicants can only apply to one PhD project among the available ones. Multiple applications of one candidate will automatically make all his/her applications ineligible.
The applications will be analysed by the Management Team for eligibility and completeness. Afterwards, the applications will be reviewed by the Selection Committee. In the pre-selection round (March-April 2021), applicants will be rated using a scoring system based on 3 criteria (academic excellence, experience, motivation, and qualities). A shortlist of qualified applicants will be interviewed during the selection round (June 2021) to further assess their qualifications and skills according to the predefined selection criteria.
All information regarding the applications (criteria, composition of the Selection Committee, requirements) can be found on the website of the programme, in greater detail.
The selection and recruitment processes of the PhD student will be in accordance with the European Charter for Researchers and Code of Conduct of the Recruitment of Researchers. The recruitment process will be open, transparent, impartial, equitable, and merit based. There will be no discrimination based on race, gender, sexual orientation, religion of belief, disability, or age.
The Institute of Biology of the Ecole Normale Supérieure (IBENS), is an internationally recognized center for fundamental biological research, triply affiliated to the National Center for Scientific Research (CNRS), the National Health and Medical Research Institute (Inserm), and the Ecole Normale Supérieure (ENS - PSL), a top-ranking higher education establishment belonging to Université PSL.
Hélène Morlon leads the «Modeling Biodiversity» team at the IBENS. It is a center for fundamental research that leads innovative projects aimed at deciphering the essential mechanisms and principles that rule living systems. The science done at IBENS covers four major areas, one of which (the one that the team belongs to) is Ecology and Evolutionary Biology. Hélène Morlon’s team is also part of the Computational Biology Center of IBENS and affiliated to the Centre for Interdisciplinary Research in Biology (CIRB) at the Collège de France
The Ecole Normale Supérieure - PSL is a leading multidisciplinary institution that focuses on training through research. The ENS - PSL defines and applies scientific and technological research policies, from a multidisciplinary and international perspective and counts close relationships with prestigious partners, in France and abroad. It encompasses fourteen teaching and research departments, spanning the main humanities, sciences, and disciplines. The ENS - PSL currently has a staff of almost 800 lecturers, ENS - PSL, CNRS or associated researchers, and post-doc researchers. Within its Departments, the ENS - PSL includes 40 research units identified as ENS - PSL, INSERM or INRIA, encompassing ENS - PSL and CNRS agents as well as 300 foreign researchers and 650 doctoral students. The ENS - PSL respects the principles of the European Charter and Code for Researchers and is engaged in the HRS4R certification.
Web site for additional job details
Required Research Experiences
RESEARCH FIELDComputer science
YEARS OF RESEARCH EXPERIENCE1 - 4
REQUIRED EDUCATION LEVELComputer science: Master Degree or equivalent
REQUIRED LANGUAGESENGLISH: Excellent
- Computer scientist with a strong interest in Ecology/Evolution, or a biologist with additional training in Computer science.
- Ability to code in C would be desirable for modifications of the gen3sys core code.
EURAXESS offer ID: 581234
The responsibility for the jobs published on this website, including the job description, lies entirely with the publishing institutions. The application is handled uniquely by the employer, who is also fully responsible for the recruitment and selection processes.
Please contact firstname.lastname@example.org if you wish to download all jobs in XML.