The Human Resources Strategy for Researchers
Marie Skłodowska-Curie Actions

PhD position 09 – MSCA COFUND, AI4theSciences (PSL, France) - “LIterary Success, Style and Artificial Intelligence (LISAI)”

This job offer has expired

    Université PSL
    LiteratureEuropean literature
    First Stage Researcher (R1)
    26/02/2021 23:00 - Europe/Brussels
    France › Montrouge
    H2020 / Marie Skłodowska-Curie Actions COFUND


“Artificial intelligence for the Sciences” (AI4theSciences) is an innovative, interdisciplinary and intersectoral PhD programme, led by Université Paris Sciences et Lettres (PSL) and co-funded by the European Commission. Supported by the European innovation and research programme Horizon 2020-Marie Sklodowska-Curie Actions, AI4theSciences is uniquely shaped to train a new generation of researchers at the highest academic level in their main discipline (Physics, Engineering, Biology, Human and Social Sciences) and master the latest technologies in Artificial Intelligence and Machine Learning which apply in their own field.

26 doctoral students will join the PSL university's doctoral schools in 2 academic cohorts to carry out work on subjects suggested and defined by PSL's scientific community. The 2020 call will offer up to 15 PhD positions on 24 PhD research projects. The candidates will be recruited through HR processes of high standard, based on transparency, equal opportunities and excellence.


Description of the PhD subject: “LIterary Success, Style and Artificial Intelligence (LISAI)”


Context - Motivation

Large literary corpora are now available (made of several hundreds, or even thousands of works) and Natural language processing (NLP) tools are now quite robust and accurate. For example, it is now possible to syntactically parse thousands of novels in a few hours only, with a standard computer.

The use of advanced NLP techniques for the analysis of literary corpus has produced original studies, whether it be for modeling suspense, characters or interaction networks. This is a very active research theme in the Anglo-Saxon world, and much less so in France, although there are also powerful NLP tools for French, as well as accessible corpora.

From the standpoint of literary studies, the existence of large corpora and the methods to process them allows for new approaches to long standing questions about style, authorship, genre or literary quality, for instance, but it also has the potential to generate new interrogations or theories beyond the traditional canon of literary studies.

In this context, this project will challenge the notion of masterpiece (chef d'œuvre) and will interrogate the literary canon. Masterpieces are usually considered to be singularities, exceptions when compared to their contemporary literary production, and, as such, hard to study from a quantitative or serial approach.

The aim of this project will be to assess if the success or cultural importance of written literary works can be matched to quantifiable textual features, and if consequently works can be automatically labelled through artificial intelligence techniques.


Scientific Objectives, Methodology & Expected results

The candidate will focus on a corpus drawn from a single literary genre from a given time period (e.g. XXth century novels), available in digital form. The analysis may include several of the following elements:

  • the analysis of stylometric features, of grammatical and lexical nature (lexicon, function words, affixes, etc.), in terms of frequency, variety, recurring patterns, etc.
  • the identification of the narrative progression (using techniques like topic modeling for example),
  • the application of "sentiment analysis" techniques adapted to the novel,
  • the identification of syntactic patterns, co-occurrences and their distribution in the texts (textual topology),
  • the identification of narrative structures or narrative breaks, using text segmentation technique for example,
  • diachronic analysis of sub-genres,
  • the contrastive analysis of different kinds of works (e.g. popular vs "serious" novels, etc).

The expected result of such an analysis is the identification of typical features structures and a confrontation with the discourse of critics and other literary specialists or with such proxies for the success of the works that the PhD candidate will find appropriate. The position requires a certain familiarity with recent learning techniques used in NLP and in stylometry.

The analysis will be based on existing tools when available, but will also require the development of new tools, adapted to the task, and based on manual annotations. The research will challenge the state of the art in computational linguistics since literary texts are heterogeneous, varied and temporarily dynamic. Language used in the 18th century is not the same as the one from the 20th century. Style varies from one genre to another, and even inside a single genre. Each author has a specific idiolect and idiosyncrasies may be hard to process (a typical example is being able to parse Proust's long sentences). All this means specific challenges will have to be met and overcome, like to learn with little training data, to use efficient annotation procedures (using active learning for example) or develop robust techniques (to ensure a good annotation quality despite corpus diversity).



  • Year 1:
    •  month 1 to 6: - constitution of training and testing corpora from available digital documents ; - study on the state of the art (bibliography)
    • month 6 to 12: - elaboration of definitions of literary achievement; -  further work on the corpus to integrate measurements of achievement (human annotation; secondary sources mining ...); - initial experiments using ML
  • Year 2:
    • month 13 to 18: - design and implementation of the experimental protocol
    • month 19 to 24: - interpretation of the first results; - corrections to protocol and/or corpus
  • Year 3:
    • month 25 to 30: - final results of the experiments; - interpretation; - beginning of thesis writing
    • month 31 to 36: - writing and corrections


International mobility

The PhD Candidate will be encouraged to perform a research stay in Antwerp (BE) as a Visiting Research Student, under the supervision of Mike Kestemont. Kestemont has developed original research in computational text processing, ranging from stylistics to medieval philology. The successful candidate will be encouraged to do a long research stay in Antwerp to explore the methods developed there.

The candidate will also be part of the international research network Cyclades, led by Thierry Poibeau and funded by CNRS. Cyclades includes a number of first class labs and national libraries (Stanford literary lab, the Göttingen digital humanities lab, the British library, the Turing institute and the University of Cambridge in the UK, Lattice, the SciencesPo medialab and the BnF in France). The goal is to explore natural language processing for large corpus of literary texts. The PhD is fully aligned with the objectives of this research network and will benefit from this rich international network of collaboration. All the partners develop interdisciplinary research by nature, mixing literature, linguistics and computer science (esp. machine learning and artificial intelligence).


Thesis supervision

Thierry Poibeau and Jean-Baptiste Camps



Created in 2012, Université PSL is aiming at developing interdisciplinary training programmes and science projects of excellence within its members. Its 140 laboratories and 2,900 researchers carry out high-level disciplinary research, both fundamental and applied, fostering a strong interdisciplinary approach. The scope of Université PSL covers all areas of knowledge and creation (Sciences, Humanities and Social Science, Engineering, the Arts). Its eleven component schools gather 17,000 students and have won more than 200 ERC. PSL has been ranked 36th in the 2020 Shanghai ranking (ARWU).


More Information


  • Opportunity to conduct academic research in a top 100 university in the world.
  • High-quality doctoral training rewarded by a PhD degree, prepared within Ecole Normale Supérieure - PSL and delivered by PSL.
  • Access to cutting-edge infrastructures for research & innovation.
  • Appointment for a period of 36 months (job contract delivered by the involved component school of PSL) based on a salary of 3100 € gross employer (including employer tax) per month or approximately a 2228 € gross salary per month.
  • Job contract under the French labour legislation in force, respecting health and safety, and social security: 35 hours per week contract, 25 days of annual leave per year (“congés annuels”). Eventual complementary activities may be accepted or proposed by the co-supervisors (maximum of 64h/year for teaching, 32 day/year for specific missions).
  • Short stay(s) or secondment in France or abroad are expected.
  • An international environment supported by the adherence to the European Charter & Code.
  • Access to AI training package, with a strong interdisciplinary focus, together with a Career development Plan.

Eligibility criteria

  • Applicants must have a Master’s degree (or be in the process of obtaining one) or have a University degree equivalent to a European Master’s (5-year duration) to be eligible at the time of the deadline of the relative call.
  • There is no nationality or age criteria, but applicants must not have resided or carried out their main activity (work, studies, etc.) in France for more than 12 months in the 3 years immediately before the deadline of the call (MSCA Mobility rule).
  • Applicants must declare to be available to start the programme on schedule.

For submitting your online application, go to: https://www.psl.eu/recherche/grands-projets-de-recherche/projets-europee...


The online application should contain the following documents:

  • English translated transcripts from the Master’s degree (or equivalent 5-year degree). A copy of the Master’s degree or a certificate of achievement will be required later on for the final registration.
  • International curriculum vitae and a cover letter explaining the reasons that lead him/her to prepare a PhD, why he/she applies to this offer and his/her professional project (guidelines will be given to the applicants in order to help him/her in the writing of his/her letter).
  • Two academic reference letters.
  • A statement duly signed on the mobility rules, availability, and conflicts of interest.


The applicants can only apply to one PhD project among the available ones. Multiple applications of one candidate will automatically make all his/her applications ineligible.

Selection process

The applications will be analysed by the Management Team for eligibility and completeness. Afterwards, the applications will be reviewed by the Selection Committee. In the pre-selection round (March-April 2021), applicants will be rated using a scoring system based on 3 criteria (academic excellence, experience, motivation, and qualities). A shortlist of qualified applicants will be interviewed during the selection round (June 2021) to further assess their qualifications and skills according to the predefined selection criteria.

All information regarding the applications (criteria, composition of the Selection Committee, requirements) can be found on the website of the programme, in greater detail.


The selection and recruitment processes of the PhD student will be in accordance with the European Charter for Researchers and Code of Conduct of the Recruitment of Researchers. The recruitment process will be open, transparent, impartial, equitable, and merit based. There will be no discrimination based on race, gender, sexual orientation, religion of belief, disability, or age.

Additional comments

LATTICE (UMR CNRS 8094) is a research unit depending on CNRS, Ecole Normale Supérieure - PSL and the Université Sorbonne Nouvelle. The lab includes more than 40 persons, around 20 permanent staff (researchers, lecturers and engineers) and around 20 research associates (PhD and post-docs). Main research areas at LATTICE include linguistics, natural language processing and digital humanities, and the lab is of course involved in different research projects mixing these different fields. Our goal is to develop cutting edge research in natural language processing (e.g. in syntactic parsing), while keeping an eye both on the interpretability of our models and on their applicability to specific problems, for example to process under-resourced languages. LATTICE is involved in the 3IA center of Paris (PRAIRIE, Paris Artificial Intelligence Research Institute) through Thierry Poibeau's chair dedicated to natural language processing and digital humanities. LATTICE is also part of the EUR TRANSLITTERAE, gathering most of the labs in Humanities at the Ecole Normale Supérieure - PSL. The lab is also involved in numerous national and international collaborations.


The Ecole Normale Supérieure - PSL is a leading multidisciplinary institution that focuses on training through research. The ENS - PSL defines and applies scientific and technological research policies, from a multidisciplinary and international perspective and counts close relationships with prestigious partners, in France and abroad. It encompasses fourteen teaching and research departments, spanning the main Humanities, Sciences, and disciplines. The ENS - PSL currently has a staff of almost 800 lecturers, ENS - PSL, CNRS or associated researchers, and post-doc researchers. Within its Departments, the ENS - PSL includes 40 research units identified as ENS - PSL, INSERM or INRIA, encompassing ENS - PSL and CNRS agents as well as 300 foreign researchers and 650 doctoral students. The ENS - PSL respects the principles of the European Charter and Code for Researchers and is engaged in the HRS4R certification.

Web site for additional job details

Required Research Experiences

    Computer science
    1 - 4

Offer Requirements

    Computer science: Master Degree or equivalent
    ENGLISH: Excellent


  • Graduate in Digital Humanities or Computational Linguistics.
  • Experience in computational methods and machine learning is required.
  • A specialization in Stylistics, Philology or Literary studies would be a plus.

Specific Requirements

  • Candidates will have to demonstrate a good command of French literature and some programming skills, preferably in Python and/or R.

Work location(s)
1 position(s) available at
LATTICE, Ecole Normale Supérieure - PSL
1, rue Maurice Arnoux

EURAXESS offer ID: 579056


The responsibility for the jobs published on this website, including the job description, lies entirely with the publishing institutions. The application is handled uniquely by the employer, who is also fully responsible for the recruitment and selection processes.


Please contact support@euraxess.org if you wish to download all jobs in XML.