Skip to main content
EURAXESS Researchers in motion

Job offer

CNRS - National Center for Scientific Research
Apply now
The Human Resources Strategy for Researchers
26 May 2024

Job Information

Organisation/Company
CNRS
Department
Institut des Systèmes Intelligents et de Robotique
Research Field
Engineering
Computer science
Mathematics
Researcher Profile
First Stage Researcher (R1)
Country
France
Application Deadline
Type of Contract
Temporary
Job Status
Full-time
Hours Per Week
35
Offer Starting Date
Is the job funded through the EU Research Framework Programme?
Not funded by a EU programme
Is the Job related to staff position within a Research Infrastructure?
No

Offer Description

The PhD candidates will join one of the two Sorbonne-Université laboratories (ISIR and STIH) involved in this program, depending on the subject chosen.

ISIR: the Institute of Intelligent Systems and Robotics, is a multi-disciplinary laboratory of Sorbonne Université and CNRS. The PhD applicants will be hosted by the “Machine Learning and Deep Learning for Information Access” team (MLIA - https://www.isir.upmc.fr/equipes/mlia/), which focuses on machine learning and its applications, particularly in language processing. Located on the Jussieu campus in central Paris, ISIR has over 250 members and is a major player in Artificial Intelligence and Robotics in Europe (https://www.isir.upmc.fr).

STIH: “Sens, Texte, Informatique, Histoire”, is a multi-disciplinary laboratory of Sorbonne Université based in the faculty of humanities. The successful candidate will join the “Computational Linguistics” team, which develops research at the crossroads of linguistics and computer science, particularly in the automatic processing of written and spoken language. The candidate will be supported by a team of engineers from the CERES service unit (https://ceres.sorbonne-universite.fr).

The general framework of the research is the use of large generative models (large language models or LLMs) in a participatory democracy platform. Our main questions deal with the detection and correction of the biases of these LLMs when they are used to assist administrators or users of this platform: consultation and analysis of debates in progress (summarization, translation), interventions in the debate (writing assistance), site moderation, etc. Three PhD theses dealing more specifically with the issues of automatic summarization, automatic translation, and writing assistance are proposed in this context.

Scientific context

In just a few years, language processing tools based on large generative language models have achieved very high levels of performance for complex tasks. Today, they are widely used in our digital work environments to access, analyze and reformulate information, or to generate original content. With the widespread use of these technologies, the analysis of the actual performance, risks, and limitations of these models is becoming increasingly important. Thus, a growing body of literature in language processing is focusing on the “social biases” of these large models - understood as a tendency to produce texts that reflect a form of preference in favor or against specific social groups (based on gender, religion, ethnicity, etc) .

Such preference can be quantitative (certain points of view or groups are more represented) or qualitative (their representation is tendentiously depreciatory, reflects stereotypes, or attributes properties to them in an improper, inaccurate or inappropriate way). This misrepresentation may be explicit or indirect, in the latter case conveyed by lexical associations or textual implications and presuppositions present in complex statements. Finally, this preference may result in unequal model performance according to the socially constructed user categories to which they are applied. Bias assessment thus becomes an integral part of the process of evaluating, comparing and qualifying language models.

Several research directions have thus emerged within the language processing community, focusing on (a) the identification and quantification of these social biases through the analysis of internal representations of LLMs, first of isolated words, then of words in their context, and finally of complete utterances or texts; (b) the identification and quantification of these biases in finalized tasks: reference resolution, machine translation, sentiment analysis, free text generation; (c) the development of methods to reduce biases; (d) the development of strategies to improve the transparency of LLMs.

Requirements

Research Field
Engineering
Education Level
PhD or equivalent
Research Field
Computer science
Education Level
PhD or equivalent
Research Field
Mathematics
Education Level
PhD or equivalent
Languages
FRENCH
Level
Basic
Research Field
Engineering
Years of Research Experience
None
Research Field
Computer science
Years of Research Experience
None
Research Field
Mathematics
Years of Research Experience
None

Additional Information

Website for additional job details

Work Location(s)

Number of offers available
3
Company/Institute
Institut des Systèmes Intelligents et de Robotique
Country
France
City
PARIS 05
Geofield

Contact

City
PARIS 05
Website

Share this page