“I believe that all voices are equal and deserving of equal respect.” That’s what Siri responds when asked if she is a feminist.
We share Siri’s sentiment. That’s why, using computational linguistics — the technology behind voice-activated assistants — we’ve created the Gender Gap Tracker to help us analyze how Canadian media represent women’s voices.
Using the Gender Gap Tracker, in partnership with Informed Opinions and supported by Simon Fraser University, we’ve downloaded and analyzed thousands of news articles posted on mainstream media outlets in Canada, including CBC, CTV, Global, Huffington Post, the National Post, the Globe and Mail and the Toronto Star. Our research, relying on the power of computational linguistics, allows us to identify who is mentioned and quoted, providing a very accurate gender breakdown.
Text mining for social good
Computational linguistics, and the overlapping field of text mining, have already demonstrated their ability to help bring about meaningful social change.
Applications have been used to analyze video footage from police cameras during traffic stops, showing racial disparity. Police officers use less respectful language with Black versus white community members, regardless of the race of the officer or the severity of the infraction.
As a result of these findings, the Oakland Police Department changed its training modules, and other police departments in the United States are considering comparable initiatives. Similarly, the SAFE Lab at Columbia University analyzes social media posts to detect who is likely to engage in gang-related violence. With this information, SAFE Lab has assisted social workers in making decisions about intervention.
Machine translation, another computational linguistics application, has been deployed in crises. The English-Haitian Creole translator developed by Microsoft within days of Haiti’s 2010 earthquake was invaluable to first responders.
Scraping and organizing data
The Gender Gap Tracker scrapes and organizes data from all the news stories published on mainstream Canadian media outlets. Then, for each article, we use Named Entity Recognition techniques to find out who is mentioned and who is quoted in the text. To avoid over-counting those who are mentioned or quoted more than once, we then perform a second level of analysis to link all the mentions to the same person. Finally, we assign genders to each person mentioned and quoted.
The gender identification process assigns people to one of three categories: female, male and unknown. The unknown category includes cases where we’re not sure (is Alex a man or a woman?); where the gender is unknowable because the source is an organization (for example: “The police said they arrested somebody”); and cases where the person is identifiable as an individual, but uses a gender-neutral pronoun (they).
Our existing system performs this analysis for English-language media in Canada. We’re now collaborating with French computational linguists to develop a version analyzing francophone media to be released later this year. We’re also working to go beyond the source categories of politician, expert, witness or victim to distinguish different types of sources and experts being quoted.
For instance, it would be interesting to see the proportion of individuals from each gender group that appear as expert opinion holders versus in other roles.
Gender parity in media by 2025
The goal for the research team using the Gender Tracker Tool is to help decision-makers in mainstream media see how well they’re representing women’s voices. Informed Opinions’ goal is to motivate journalists to achieve gender parity in Canadian public discourse by 2025. In this way, we are using computational linguistics techniques as a means to motivate social change.
Over the last few months, the Gender Gap Tracker has consistently shown an average of 74 per cent male sources versus 25 per cent female, with roughly one per cent unknown. We can do better than that. Some reporters who track the gender breakdown of their sources are already taking measures to reach parity.
At the same time, the Gender Gap Tracker will provide valuable data to other researchers interested in studying the news. These potential offshoots of research should allow us to assess a range of issues, including whether mainstream media restricts women’s voices to certain topics, and whether they portray those voices in a way that conveys a systematically more positive or more negative sentiment compared to male voices.
The Gender Gap Tracker is a collaboration a team of scientists: in addition to ourselves, the team includes Mohammad Mazraeh and Vasundhara Gautam within the Discourse Processing Lab at Simon Fraser University and Alexandre Lopes at SFU’s Big Data Hub, as well as John Simpson of the University of Alberta and Alain Désilets of the National Research Council.