• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Group Fairness For Multiple Group Scenario in Text Classification

Student: Lapko Daria

Supervisor: Tamara Voznesenskaya

Faculty: Faculty of Computer Science

Educational Programme: Data Science and Business Analytics (Bachelor)

Final Grade: 8

Year of Graduation: 2024

This thesis investigates the problem of intersectional debiasing in text classification models, aiming to mitigate biases that arise from the intersection of multiple demographic attributes like gender, race, age, and country. Existing debiasing techniques often focus on single attributes in isolation, overlooking important sources of unfairness that manifest at the intersections of these attributes. To address this gap, the thesis proposes using a joint attribute that encodes the combinations of protected attributes into a single variable. This allows directly optimizing for fairness across intersectional subgroups, rather than just individual attributes. Three debiasing methods are evaluated: Least-squares Concept Erasure (LEACE), Adversarial Training (Adv), and Balanced Training with Equal Opportunity (BTEO). These techniques are applied to the Multilingual Twitter Corpus (MTC) dataset, which contains hate speech annotations along with inferred author demographics. The thesis tests three key hypotheses: 1) Debiasing on single attributes is insufficient to substantially improve fairness on the joint attribute, 2) Debiasing on a single attribute can improve fairness on the joint attribute, and 3) There exist correlations between biases in different attributes that can be leveraged for cross-attribute debiasing. Experiments are conducted using the FairLib framework, with accuracy and fairness metrics like TPR-GAP and Distance to Optimum (DTO) evaluated across different debiasing methods and attributes. The results provide insights into the effectiveness of intersectional debiasing and the trade-offs between fairness and performance. The thesis concludes with a discussion of future research directions to further advance the state-of-the-art in fair and inclusive text classification.

Full text (added May 27, 2024)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses