The research was organized by two scientists from the University of Southern California Viterbi’s School of Engineering – Mayank Kejriwal and Akarsh Nagaraj. The first is the chair of the University of Southern California’s Institute of Information Sciences (ISI), and was encouraged to work on a topic related to the literature after reading the latest scientific papers on gender biases and his own knowledge in the field of Neuro Linguistic Programming (NLP). Akarsh Nagaraj, on the other hand, is a machine learning engineer.
What books have been studied?
The study was limited to 3000 titles that appeared within Project Gutenberg. This project was started in 1971 by Michael Hart, who is often referred to as the inventor of the e-book because he was the first to send other users a digital copy of a printed book. And a famous book like the American Declaration of Independence. By 2016, more than 50,000 e-books were made available under Project Gutenberg, and scientists decided to use its database so as not to be accused of bias – Project Gutenberg offers completely different titles. The types of books studied varied: from science fiction to adventure titles to fiction and poetry. On the basis of the analyzes made, the researchers found that the disproportion between the representation of men and women in the literature is 4:1. However, what is significant is that most of the titles made available as part of the project were published before 1924 (they can be made freely available because private copyright it has expired). Therefore, the study relates to books dating back at least a century ago.
AI is used to draw attention to the issue of inequality
In the study, the scientists used, inter alia, a neuro-linguistic programming technique called NER (Named Entity Recognition), which allows you to automatically determine the meaning of individual text fragments. For example, it can specify whether a particular content names a person, thing, or place as well as the gender of the hero/heroine. Thus, the researchers counted the number of female and male pronouns that appear in the books analyzed. Another technique is to check how many female characters are the main characters for specific publications. Interestingly, the disproportion between the number of male and female characters in the book decreases when the author is a woman – in the past, women wrote about their gender more than men.
As pointed out by Kejriwal:
Gender bias in the literature is a reality. And already reading classic books, we see that there are four times fewer heroines in them than there are, and they have a subconscious effect on us – the users of culture.
Nagaraj added that books are a window into the past, and the books of the authors surveyed give us insight into how people perceive the world and how it changes over time.
The researchers have been criticized, among other things, for the fact that their work does not include transgender and/or nonbinary people. The researchers said that in some ways they agree with the criticism and explain that, according to them, transgender and non-binary people have been almost completely ignored in the literature, so it will also be difficult to find material for research. They also added that there are no effective tools yet to recognize the pronoun “they” in a text for the people who use it.
However, the NLP technique is free from the bias that can occur in human surveys, and it has also enabled researchers to find traits associated with a particular type in a text. Words associated with women are adjectives such as: “weak”, “beautiful”, “stupid”, “nice”, and with men: “leading”, “bossy”, “strong”. The researchers hope their work will highlight the importance of interdisciplinary research – in this case, artificial intelligence technology has been used to highlight social issues and inequality.
Then and today
It should be noted that we have already written about Study confirms overrepresentation of male heroes in books. An analysis of thousands (3,280) books published between 1960 and 2020 shows that there are more boys than girls in American children’s literature, although these percentages change every year and the number of female characters increases.
It is interesting that the authors of the popular book Bedtime stories for young rebelsHe conducted an experiment in which the mother and daughter participated. The ladies visited bookshops and removed first from the shelves of books without male characters (as you can see in the video below, there were only two such items), and then those publications in which there is not a single heroine – they found as many as 74 such works. After that, the mother and daughter took the books off the shelves, where the girlish characters were, but said nothing – they put aside another 67 books. Finally, they decided to take off the shelves the pieces in which the princesses were waiting for the prince to save them. How many books are left at the end? See for yourself!