Logo of Marie Skłodowska-Curie Actions

PhD position 23 – MSCA COFUND, AI4theSciences (PSL, France) - “Machine Learning for biodiversity monitoring”

This job offer has expired

    Université PSL
    Environmental scienceEcology
    First Stage Researcher (R1)
    26/02/2021 23:00 - Europe/Brussels
    France › Montpellier
    H2020 / Marie Skłodowska-Curie Actions COFUND


“Artificial intelligence for the Sciences” (AI4theSciences) is an innovative, interdisciplinary and intersectoral PhD programme, led by Université Paris Sciences et Lettres (PSL) and co-funded by the European Commission. Supported by the European innovation and research programme Horizon 2020-Marie Sklodowska-Curie Actions, AI4theSciences is uniquely shaped to train a new generation of researchers at the highest academic level in their main discipline (Physics, Engineering, Biology, Human and Social Sciences) and master the latest technologies in Artificial Intelligence and Machine Learning which apply in their own field.

26 doctoral students will join the PSL university's doctoral schools in 2 academic cohorts to carry out work on subjects suggested and defined by PSL's scientific community. The 2020 call will offer up to 15 PhD positions on 24 PhD research projects. The candidates will be recruited through HR processes of high standard, based on transparency, equal opportunities and excellence.


Description of the PhD subject: “Machine Learning for biodiversity monitoring”


Context - Motivation

The current biodiversity crisis urgently demands novel indicators of ecosystem health to monitor how human activities influence the biosphere. Conventional bio-monitoring rely mainly on morpho-taxonomic identification of a limited number of indicator taxa. Taxonomic identification is labor intensive and requires taxonomic expertise, which leads to long time lags between sampling and reporting of results, high costs, a slow non-automatable process, and low upscaling potential to detect significant biodiversity changes at global scale. While methods based on remote sensing allow instantaneous observation of habitats and physical descriptors of the environment, similar standardized monitoring of biodiversity over equivalent large scales has been impossible to date.

Fortunately, our ability to rapidly generate community inventories is growing, with the emergence of new technologies for biodiversity survey. Notably, next generation sequencing in the context of environmental genomics has given rise to the metabarcoding of environmental DNA (eDNA). Organisms living in an ecosystem shed tissue material and environmental eDNA metabarcoding allows reading DNA sequences to offer an integrative view of ecosystem composition. Compared to conventional survey methods, eDNA has demonstrated higher detection capacity and cost effectiveness while causing no ecosystem disturbance and can measure all-in-end biodiversity from prokaryotes to eukaryotes. Additionally, eDNA is capable of detecting rare, cryptic and invasive species which are often missed in classical surveys. Given the low field collection and thedecrease in sequencing costs over the last years, this approach can be scaled up to monitor many different locations at a very high temporal frequency.

However, eDNA represent massive sequencing data -about 20 Go for 1 milliard of sequences- with complex structure, and converting eDNA data into indicators that can be easily handled by policy makers is paramount for a large application of this technology. In particular, data processing of eDNA at different locations should be able to score the ecological quality among sites e.g. toward conservation prioritization. Similarly, data processing of eDNA monitoring time series should provide early warning signal of ecosystem degradation to promote management actions, as well as the preliminary signal of invasive species. Processing eDNA data into relevant indicators of ecosystem health represent a computational challenge, which limits at the moment the deployment of observatory networks using this technology. Yet, the qualitative and efficient development of data processing pipelines has not matched the quantitative increase of `omics' data. The main reason limiting their large-scale application for ecosystem monitoring is that an integrative and standardized pipeline for handlingand processing such large amount of information into relevant ecosystem indicators is not yet available. Classic bioinformatic pipelines have been developed to process eDNA from raw reads into synthetic ecosystem information, but generally suffer from lack of automatizing, time consuming and limited through put capacity. Moving toward large-scale eDNA monitoring would require a processing pipeline that can transform eDNA sequence data into meaningful and standardized ecological indicators, useful for management, in a fast and automated way.


Interest of artificial intelligence/big data processing.

The processing of eDNA sequences could be boosted by recent development in artificial intelligence related to supervised and unsupervised machine learning for complex classification tasks. Machine learning could be embedded within eDNA processing pipeline at two main steps:

  1. deep learning to speed up and improve the precision of the assignment of eDNA reads to a known taxonomic from a reference database,
  2. machine learning to train the relationship between eDNA composition and ecosystem indicators that are relevant for management to enhance the decision process.

Attempts to use machine learning are rare and usually not fully integrated in bioinformatic pipelines. A recent attempt to use machine learning with metabarcoding DNA data showed the potential of neural networks for taxonomic classification from DNA metabarcoding (Busio et al. 2018). Other uses of machine learning in thecontext of eDNA have attempted to learn the link between eDNA taxonomic unit composition and indicators of water quality (Cordier et al. 2018). Despite those recent examples, the use of machine learning for eDNA remains in its infancy and dedicated research is needed to develop the necessary tools for the field. The development of data processing and machine learning tools for eDNA science would allow to bolster this technology for national, international and global monitoring networks.


Scientific Objectives, Methodology & Expected results

The objective of the thesis is to harness a combination of machine learning approaches to support the development of a fast data pipeline that transforms eDNA metabarcoding data into ecological indicators for ecosystem monitoring.The PhD student will develop machine learning to improve the identification of the taxonomic composition of eDNA samples and to link eDNA composition to ecological indicators. Approaches are going to be developed from two case studies: freshwater fishes from French Guyana. Field collections (2014 –2019) were recorded in 400 sites for environmental variables together with the filtering of water for eDNA. This represents an average of 2,000,000 sequence reads (paired-end Illumina) per sample. In parallel, an eDNA reference database as well as a database on species functional traits was created for most species potentially detected in the eDNA sample.

The second case study is a global marine dataset gathering 450 samples collected in 29 sites across all the oceans, from the Antarctic to tropical seas, from which eDNA of fishes are already sequenced for 12S rRNA the Teleo1 gene fragment and are directly available for analyses. This generated an average of 2,000,000 sequence reads (paired-end Illumina).


International mobility

The project involves a collaboration between 3 labs (CEFE, EPHE - PSL ; ETH Zürich-WSL Birmensdorf ; TIMC-IMAG, Université Grenoble-Alpes) with complementary skills (ecology, genomics, modelling, artificial intelligence). 

The student is going to be based at the CEFE in Montpellier, will spend 10 months in Pr Pellissier lab at ETH Zurich, and will travel for short stays at Pr François lab (UJF-IMAG).

The data analyzed have been sampled across the world and involves various international collaborations. The datasets are shared from a collaboration with the company SPYGEN, based in Le Bourget du Lac, France.


Thesis supervision

Stéphanie Manel and Loïc Pellissier



Created in 2012, Université PSL is aiming at developing interdisciplinary training programmes and science projects of excellence within its members. Its 140 laboratories and 2,900 researchers carry out high-level disciplinary research, both fundamental and applied, fostering a strong interdisciplinary approach. The scope of Université PSL covers all areas of knowledge and creation (Sciences, Humanities and Social Science, Engineering, the Arts). Its eleven component schools gather 17,000 students and have won more than 200 ERC. PSL has been ranked 36th in the 2020 Shanghai ranking (ARWU).

More Information


  • Opportunity to conduct academic research in a top 100 university in the world.
  • High-quality doctoral training rewarded by a PhD degree, prepared within Ecole Pratique des Hautes Etudes - PSL and delivered by PSL.
  • Access to cutting-edge infrastructures for research & innovation.
  • Appointment for a period of 36 months (job contract delivered by the involved component school of PSL) based on a salary of 3100 € gross employer (including employer tax) per month or approximately a 2228 € gross salary per month.
  • Job contract under the French labour legislation in force, respecting health and safety, and social security: 35 hours per week contract, 25 days of annual leave per year (“congés annuels”). Eventual complementary activities may be accepted or proposed by the co-supervisors (maximum of 64h/year for teaching, 32 day/year for specific missions).
  • Short stay(s) or secondment in France or abroad are expected.
  • An international environment supported by the adherence to the European Charter & Code.
  • Access to AI training package, with a strong interdisciplinary focus, together with a Career development Plan.

Eligibility criteria

  • Applicants must have a Master’s degree (or be in the process of obtaining one) or have a University degree equivalent to a European Master’s (5-year duration) to be eligible at the time of the deadline of the relative call.
  • There is no nationality or age criteria, but applicants must not have resided or carried out their main activity (work, studies, etc.) in France for more than 12 months in the 3 years immediately before the deadline of the call (MSCA Mobility rule).
  • Applicants must declare to be available to start the programme on schedule.

For submitting your online application, go to: https://www.psl.eu/recherche/grands-projets-de-recherche/projets-europee...


The online application should contain the following documents:

  • English translated transcripts from the Master’s degree (or equivalent 5-year degree). A copy of the Master’s degree or a certificate of achievement will be required later on for the final registration.
  • International curriculum vitae and a cover letter explaining the reasons that lead him/her to prepare a PhD, why he/she applies to this offer and his/her professional project (guidelines will be given to the applicants in order to help him/her in the writing of his/her letter).
  • Two academic reference letters.
  • A statement duly signed on the mobility rules, availability, and conflicts of interest.

The applicants can only apply to one PhD project among the available ones. Multiple applications of one candidate will automatically make all his/her applications ineligible.

Selection process

The applications will be analysed by the Management Team for eligibility and completeness. Afterwards, the applications will be reviewed by the Selection Committee. In the pre-selection round (March-April 2021), applicants will be rated using a scoring system based on 3 criteria (academic excellence, experience, motivation, and qualities). A shortlist of qualified applicants will be interviewed during the selection round (June 2021) to further assess their qualifications and skills according to the predefined selection criteria.

All information regarding the applications (criteria, composition of the Selection Committee, requirements) can be found on the website of the programme, in greater detail.


The selection and recruitment processes of the PhD student will be in accordance with the European Charter for Researchers and Code of Conduct of the Recruitment of Researchers. The recruitment process will be open, transparent, impartial, equitable, and merit based. There will be no discrimination based on race, gender, sexual orientation, religion of belief, disability, or age.

Additional comments

The CEFE is currently the largest French research center in Ecology and Evolutionary Ecology. Its main and fundamental mission is to perform independent, fundamental scientific research on the dynamics of biodiversity, planetary environmental change, and sustainable development.


The École Pratique des Hautes Études - PSL (EPHE - PSL), established in the Sorbonne in 1868, is acknowledged as one of France’s ‘grands établissements’ where research is undertaken in Life and Earth Sciences, Historical and Philological Sciences, and Religious Sciences.

With 44 research teams, 1,200 students and 270 professors, EPHE - PSL is established in different locations all over the French territory, including French Polynesia. EPHE – PSL is a founding member of Campus Condorcet.

Web site for additional job details

Required Research Experiences

    Computer science
    1 - 4

Offer Requirements

    Computer science: Master Degree or equivalent
    ENGLISH: Excellent


  • Formation in Machine learning, Modelling, Statistics, and/or in Ecology or Genomics with a strong interest for machine learning

Work location(s)
1 position(s) available at
Centre d’Écologie Fonctionnelle et Évolutive, EPHE - PSL
1919, route de Mende

EURAXESS offer ID: 579100


The responsibility for the jobs published on this website, including the job description, lies entirely with the publishing institutions. The application is handled uniquely by the employer, who is also fully responsible for the recruitment and selection processes.


Please contact support@euraxess.org if you wish to download all jobs in XML.