16/12/2020
The Human Resources Strategy for Researchers
Marie Skłodowska-Curie Actions

PhD position 20 – MSCA COFUND, AI4theSciences (PSL, France) - “Unsupervized learning of causal graphical models from time-resolved cell biology data”

This job offer has expired


  • ORGANISATION/COMPANY
    Université PSL
  • RESEARCH FIELD
    Computer scienceOther
  • RESEARCHER PROFILE
    First Stage Researcher (R1)
  • APPLICATION DEADLINE
    26/02/2021 23:00 - Europe/Brussels
  • LOCATION
    France › Paris
  • TYPE OF CONTRACT
    Temporary
  • JOB STATUS
    Full-time
  • HOURS PER WEEK
    35
  • OFFER STARTING DATE
    01/09/2021
  • EU RESEARCH FRAMEWORK PROGRAMME
    H2020 / Marie Skłodowska-Curie Actions COFUND
  • REFERENCE NUMBER
    AI4theSciences-PhD-20
  • MARIE CURIE GRANT AGREEMENT NUMBER
    945304

OFFER DESCRIPTION

Artificial intelligence for the Sciences” (AI4theSciences) is an innovative, interdisciplinary and intersectoral PhD programme, led by Université Paris Sciences et Lettres (PSL) and co-funded by the European Commission. Supported by the European innovation and research programme Horizon 2020-Marie Sklodowska-Curie Actions, AI4theSciences is uniquely shaped to train a new generation of researchers at the highest academic level in their main discipline (Physics, Engineering, Biology, Human and Social Sciences) and master the latest technologies in Artificial Intelligence and Machine Learning which apply in their own field.

26 doctoral students will join the PSL university's doctoral schools in 2 academic cohorts to carry out work on subjects suggested and defined by PSL's scientific community. The 2020 call will offer up to 15 PhD positions on 24 PhD research projects. The candidates will be recruited through HR processes of high standard, based on transparency, equal opportunities and excellence.

 

Description of the PhD subject: “Unsupervized learning of causal graphical models from time-resolved cell biology data”

 

Context – Motivation

Live cell imaging microscopy and next generation sequencing technologies, now routinely used in cell biology labs and R&D pharmaceutical companies, produce massive amounts of time lapse images and gene expression data at single cell resolution. However, these wealth and variety of biological data remain largely underexplored due to the lack of unsupervized methods and tools to analyze them without preconceived hypothesis. This highlights the need to develop new machine learning strategies to better exploit the richness and complexity of the information contained in such massive cell biology data.

In principles, high-throughput technologies can scale up the discovery of novel biological processes, such as gene regulation, cell differentiation, cell-cell interactions or full developmental programs, that characterize living organisms. Yet, probing biological processes systematically through massively parallel perturbations (eg using specific drugs, RNA interference, gene knockout or crispr-based gene editing) remains often technically impracticable or costly and sometime unethical in many biological contexts. Hence, most state-of-the-art cell biology data actually consist of observational data with just a few control parameters and experimental conditions. As a result, their functional interpretation in terms of biological processes remains challenging as it requires to distinguish causal relationships from mere correlations.

The Isambert lab recently developed a novel unsupervized method to learn causal graphical models for a broad variety of biological or clinical datasets, from single cell transcriptomic data (qRT-PCR or RNA-seq), genomic alterations in tumors and protein contact maps (Verny et al. 2017, Sella et al. 2018) to medical records of patients (Cabeli et al. 2020). The method can learn a large class of graphical models including undirected, directed and possibly bidirected edges originating from latent common causes unobserved in the available data. This machine learning approach combines the analysis of multivariate information (Affeldt et al. 2015, 2016, Verny et al. 2017) between mixed-type continuous / categorical variables (Cabeli et al. 2020) with interpretable constraint-based graphical models (Li et al 2019). In brief, it starts from a complete graph and iteratively removes dispensable edges, by uncovering significant information contributions from indirect paths, while guaranteeing their consistency with the final graph (Li et al 2019). The remaining edges are then filtered based on their confidence assessment or oriented based on the signature of causality in observational data (Pearl2009). The resulting method (MIIC) outperforms concurrent methods on a broad range of benchmark networks, achieving better results with only ten to hundred times fewer samples and running ten to hundred times faster than the state-of-the-art methods (Verny et al 2017).

 

Scientific objectives, methodology & expected results

The present PhD project aims at extending this causal network learning method to analyze time-resolved cell biology data, for which the information about cellular dynamics can facilitate the discovery of novel cause-effect functional processes. In particular, we will include time-delayed effects of unobserved latent variables, that are ubiquitous in cell biology data (eg a common regulatory gene unobserved in the available data). This causal analysis will thus go beyond Granger causality, which leads to spurious causal associations based on time delays by excluding the presence of latent common causes a priori. To this end, we will extend our unsupervized learning of contemporary graphical models (ie when temporal information is not available) to reconstruct time-unfolded graphical models including latent variables, where each variable is represented by several nodes corresponding to different (relative) time points. This will enable to adapt our timeless network learning method for time series datasets with the additional assumption that future events cannot cause past ones.This causal network learning method for time series data will then be applied to analyze two types of high-throughput time-resolved cell biology data in collaboration with research teams from the University of Roma (Martinelli lab) and Institut Curie, Paris (MC Parrini in Mechta-Grigoriou lab and B Sorre in Hersen lab).

The first application concerns the analysis of time lapse images of tumor ecosystems reconstituted ex vivo by MC Parrini and collaborators (Nguyen et al. 2018) using the tumor-on-chip technology to study the effects of anti-cancer drugs on a reconstituted tumor microenvironment including cancer cells, immune cells, cancer-associated fibroblasts (CAF), and endothelial cells. Cellular features such as cell geometry, texture, velocity, division, apoptosis, cell-cell transient interactions and persistent contacts, will first be extracted from the raw images through advanced segmentation and vision-based AI techniques thanks to the expertise of the Martinelli lab (Comes et al. 2019, Callari et al. 2019, Nguyen et al. 2018), where the selected PhD candidate will be trained as part of the international mobility experience of the AI4theSciences cofund program. The PhDcandidate will be trained, in particular, on the exploitation of novel Artificial Intelligence approaches such as Deep Learning architectures (Mencattini et al. 2020) for the task of extracting morphodynamics properties in the environment under study. The evolution of the metabolic states and, more generally, the culture media content will also be measured at specific times of the experiments, using chemical sensor array previously developed by the Martinelli lab (Capuano et al. 2018, Lavra et al. 2015). Causal network inference between extracted cellular features and environmental or therapeutic variables will then be performed to uncover possible causal effects of different chemotherapies and/or immunotherapies on the tumor ecosystems reconstituted ex vivo. This research has the potential to impact precision medicine in cancer treatment by allowing to screen multiple treatments on well controlled cellular ecosystems reconstituted ex vivo from the patient’s tumor. In the long run, cancer immunotherapies are transforming anti-cancer treatment practices, extending patients’ survival, improving their quality of life, and generating huge economic and market dynamics.

The second application concerns the analysis of gene expression data during early mammalian development using droplet-based RNAseq and microfluidic technologies for single cell transcriptomics. This is an experimental project, lead by Benoît Sorre in the Hersen lab, on the differentiation of human embryonic stem cells confined in 2D on micro-patterned substrates. This method allows to recapitulate in vitro the differentiation and the spatial organization of cells, happening in vivo during gastrulation and thus offers a window into early human development that is not accessible by other means (Warmflash et al. 2014, Plouhinec et al. 2020). In particular, scRNA-seq analysis has been performed at 5 time points over the time course of differentiation, which allowed to make a detailed inventory of the cell types emerging during this process. In this PhD project, we will dissect these differentiation pathways in terms of time-resolved regulation between expressed genes. However, as single cell transcriptomic techniques do not allow to directly measure gene expression in the same cell over time (as individual cells are destroyed to measure gene expression), we will resort to an intermediate step of pseudotime assignment to each cell along the differentiation pathway using state-of-the-art approaches in the field (Saelens 2019). A cell type of particular interest arising on the 2D micro-patterned substrates corresponds to Progenitor Germ Cells (PGC), the ancestors of gamete cells used for reproduction. As only a handful of these cells are present per embryo, little is known about the mechanisms by which they arise during development, despite their medical importance in inherited genetic diseases. We thus expect that our approach will shed new light on the emergence of this elusive yet essential cell type during early embryonic development.

In these two applications, we will also perform some 'Mediation Analysis' to quantify the relative causal contributions of direct and indirect pathways in the reconstructed causal networks. The Isambert lab has already an expertise in Mediation Analysis in a different context, namely, to quantify direct versus indirect effects in cancer susceptibility and evolutionary genomics of vertebrates (Singh et al. 2012). While early Mediation Analysis (Pearl 2009) was limited to simple graphs and single mediators, recent advances in the field (Perkovic et al. 2018) enable now to perform mediation analysis on the broad class of graphs reconstructed by MIIC. This opens exciting new avenues to implement generalized mediation analysis directly on MIIC output networks including latent and confounding variables. Ultimately, some of the most interesting predictions of novel cause-effect functional processes and their quantification through Mediation analysis will be tested by the experimental partners through targeted perturbations on the systems of interest (eg using specific drugs, RNA interference, gene knockout or crispr-based gene editing).

Finally, we anticipate that this causal network learning project for time series data will have many other potential applications beyond time-resolved cell biology data. In particular, we intend to analyze some longitudinal clinical data, such as the follow-up of biomarkers indicating likely graft rejection several months or years after kidney transplant from the Necker Hospital, Paris.

 

International mobility

A number of visits to the Martinelli lab (Dept. Electronic Engineering, University of Rome Tor Vergata, Italy), to learn automatic feature extraction from time-lapse images of cellular ecosystems using ML and/or vision-based AI approaches, will bring some international mobility experience to the selected PhD candidate.

 

Thesis supervision

Hervé Isambert and Eugenio Martinelli

 

PSL

Created in 2012, Université PSL is aiming at developing interdisciplinary training programmes and science projects of excellence within its members. Its 140 laboratories and 2,900 researchers carry out high-level disciplinary research, both fundamental and applied, fostering a strong interdisciplinary approach. The scope of Université PSL covers all areas of knowledge and creation (Sciences, Humanities and Social Science, Engineering, the Arts). Its eleven component schools gather 17,000 students and have won more than 200 ERC. PSL has been ranked 36th in the 2020 Shanghai ranking (ARWU).

More Information

Benefits

  • Opportunity to conduct academic research in a top 100 university in the world.
  • High-quality doctoral training rewarded by a PhD degree, prepared within Institut Curie and delivered by PSL.
  • Access to cutting-edge infrastructures for research & innovation.
  • Appointment for a period of 36 months (job contract delivered by the involved component school of PSL) based on a salary of 3100 € gross employer (including employer tax) per month or approximately a 2228 € gross salary per month.
  • Job contract under the French labour legislation in force, respecting health and safety, and social security: 35 hours per week contract, 25 days of annual leave per year (“congés annuels”). Eventual complementary activities may be accepted or proposed by the co-supervisors (maximum of 64h/year for teaching, 32 day/year for specific missions).
  • Short stay(s) or secondment in France or abroad are expected.
  • An international environment supported by the adherence to the European Charter & Code.
  • Access to AI training package, with a strong interdisciplinary focus, together with a Career development Plan.

Eligibility criteria

  • Applicants must have a Master’s degree (or be in the process of obtaining one) or have a University degree equivalent to a European Master’s (5-year duration) to be eligible at the time of the deadline of the relative call.
  • There is no nationality or age criteria, but applicants must not have resided or carried out their main activity (work, studies, etc.) in France for more than 12 months in the 3 years immediately before the deadline of the call (MSCA Mobility rule).
  • Applicants must declare to be available to start the programme on schedule.

For submitting your online application, go to: https://www.psl.eu/recherche/grands-projets-de-recherche/projets-europee...

 

The online application should contain the following documents:

  • English translated transcripts from the Master’s degree (or equivalent 5-year degree). A copy of the Master’s degree or a certificate of achievement will be required later on for the final registration.
  • International curriculum vitae and a cover letter explaining the reasons that lead him/her to prepare a PhD, why he/she applies to this offer and his/her professional project (guidelines will be given to the applicants in order to help him/her in the writing of his/her letter).
  • Two academic reference letters.
  • A statement duly signed on the mobility rules, availability, and conflicts of interest.

 

The applicants can only apply to one PhD project among the available ones. Multiple applications of one candidate will automatically make all his/her applications ineligible.

Selection process

The applications will be analysed by the Management Team for eligibility and completeness. Afterwards, the applications will be reviewed by the Selection Committee. In the pre-selection round (March-April 2021), applicants will be rated using a scoring system based on 3 criteria (academic excellence, experience, motivation, and qualities). A shortlist of qualified applicants will be interviewed during the selection round (June 2021) to further assess their qualifications and skills according to the predefined selection criteria.

All information regarding the applications (criteria, composition of the Selection Committee, requirements) can be found on the website of the programme, in greater detail.

 

The selection and recruitment processes of the PhD student will be in accordance with the European Charter for Researchers and Code of Conduct of the Recruitment of Researchers. The recruitment process will be open, transparent, impartial, equitable, and merit based. There will be no discrimination based on race, gender, sexual orientation, religion of belief, disability, or age.

Additional comments

A recognized public interest foundation since 1921, Institut Curie has worked to fulfil its three missions since its founding by Marie Curie, namely research, care and the preservation and transfer of knowledge. This multi-disciplinary approach, part of the bylaws of the  foundation, is the DNA of Institut Curie.

In the bylaws of the foundation known as Institut Curie, we propose the development of the following, in the interests of science and of patients, via close interdisciplinary cooperation among staff:

1- Basic scientific, translational and clinical research in physics, chemistry, biology and radiobiology, to put science to work for people and to help fight diseases, in particular cancer.

2- Diagnostics, follow-up and care given to patients within a hospital group assimilated with a cancer research center, belonging to the Unicancer federation.

3- Development of research and access to innovation, which requires Institut Curie to forge ties with economic players that will help develop innovations to assist patients and/or to improve scientific knowledge.

4- Preservation and transfer of knowledge in the aforementioned areas, in particular through teaching and museology activities.

For Institut Curie, teaching is a function that’s inseparable from research and treatment, and as such is fully part of the foundation's social missions. For the preservation of knowledge, Institut Curie is the keeper of the work conducted by the Curie and Joliot-Curie families, and has a duty to pass on the history of this work to a broad audience.

Web site for additional job details

Required Research Experiences

  • RESEARCH FIELD
    Computer science
  • YEARS OF RESEARCH EXPERIENCE
    1 - 4

Offer Requirements

  • REQUIRED EDUCATION LEVEL
    Computer science: Master Degree or equivalent
  • REQUIRED LANGUAGES
    ENGLISH: Excellent

Skills/Qualifications

  • Given the broad interdisciplinary nature of the project, the PhD candidate is expected to have a Computer science, Applied maths, Engineering or Physics background.
  • Clear interest in applying advanced ML/AI methods to analyze a broad range of large scale biological data.

Work location(s)
1 position(s) available at
Institut Curie
France
Paris
75005
11, rue Pierre et Marie Curie

EURAXESS offer ID: 587588

Disclaimer:

The responsibility for the jobs published on this website, including the job description, lies entirely with the publishing institutions. The application is handled uniquely by the employer, who is also fully responsible for the recruitment and selection processes.

 

Please contact support@euraxess.org if you wish to download all jobs in XML.