Yzkkvygw 1479786141

‘He’ vs ‘she’ in Australian media coverage: what the language of news tells us about gender imbalance

New linguistic studies show the ratio of “he” to “she” in Australian news reporting is 3.4 to 1. AAP Image/April Fonti

‘He’ vs ‘she’ in Australian media coverage: what the language of news tells us about gender imbalance

New linguistic studies show the ratio of “he” to “she” in Australian news reporting is 3.4 to 1. AAP Image/April Fonti

In the dying days of Australia’s first female prime ministership, then PM Julia Gillard struck out against the “blue tie brigade”. If Tony Abbott was elected, she argued, women would be “once again banished from the centre of Australia’s political life”.

News Corp columnist Andrew Bolt called her a “sexist monster”. Even some of Gillard’s colleagues distanced themselves from her comments.

Gillard’s experience of sexism ranged from deeply misogynistic commentary – one radio host said she should be put in a chaff bag and thrown overboard – to the everyday, unconscious assumption that mainstream public life is, purely and simply, the domain of men.

A new and large corpus of Australian newspaper articles compiled by linguists at Lancaster University’s Corpus Approaches to Social Science Research Centre can help us investigate the gender imbalance in Australian public life.

The collection - consisting of nearly 13,000 articles and close to 7.4 million words - provides an extremely rich data source for studying the Australian media. The data comes from 18 Australian newspapers including The Adelaide Advertiser, The Age, The Australian, The Canberra Times, The Courier Mail, The Daily Telegraph, The Sydney Morning Herald, The West Australian, The Northern Territory News and The Hobart Mercury, among others.

It includes all news articles published over 12 months from August 2015 to July 2016 that contained one of the following keywords: Australia, Australian, or Australians.

In any large set of text, the most frequently used words are the smallest - words like “the”, “to”, “and”, “of” and “a”. But not too far down this list is the male pronoun: “he”.

Of nearly 100,000 distinct words used in the collected news articles, “he” was the 16th most frequently used. By comparison, the equivalent female pronoun - “she” - was the 66th most frequently used. “She” turned up 11,765 times, while “he” appeared more than 40,000 times.

That makes the ratio of “he” to “she” in Australian news reporting 3.4 to 1.

Unfortunately, we don’t have comparable data from the period of Gillard’s leadership. It seems likely that, with a female prime minister, this gap would have been narrower.

If you are a “he” or “she” in a text, it means you have a prominent grammatical role - you are the subject of the clause, and you have lasted long enough in the story to graduate from proper name to pronoun.

We can also examine the frequency of combinations of words, like “he said” versus “she said”. Of the articles in the corpus, “he said” appeared 9,892 times compared to “she said” at 2,709 – a ratio of roughly 3.6 to 1. That tells us something important about whose voices are being heard in Australian news media.

Pronouns aren’t the only indicator. The use of proper names – such as Peter, Paul and Malala – also give us clues. The table below is a list of the top 21 names published in the year’s worth of news articles. Why 21? The top 20 are all male names. It was not until I got to the 21st proper name in the corpus that I found a female name.

There’s a good chance the name “Julia” would have appeared in the top five when Gillard was Prime Minister. It’s no coincidence that the top female name in the list is “Julie” – the same name as Australia’s Foreign Minister.


Author provided/Lancaster University Corpus Approaches to Social Science Research Centre, CC BY-ND

Notably, but not surprisingly, the top male names are all Anglo. “Waleed” came in at number 1,661, with 41 instances.

There are other gender-specific words in our language, such as the pronouns “his” and “her”, the titles “Mr”, “Mrs” and “Ms”, the nouns “man” and “woman”, and the adjectives “male” and “female”. The numbers of times these words were used across the newspapers are listed in the table below. When added up, the male terms outnumber the female 3 to 1.


Author provided/Lancaster University Corpus Approaches to Social Science Research Centre, CC BY-ND

In a couple of these contrasts, the female term outnumbers the male. “Women” is more commonly used than “men”. It is not immediately obvious why this is the case. The most typical expression associated with “men” is “men and women”; for “women” it is “women and children” or “women and girls”.

That the word “female” is used nearly twice as much as “male” is easy to explain. How often is a person described as a “male cricketer” or a “male mathematician”? “Female” is also popular in this corpus because of the media’s interest in “female genital mutilation”.

The ratio of the word “cricket” to “netball” is 16 to 1.

Is Australia’s media coverage more male dominated than other comparable countries? The best data we have for this comparison is the News-on-the-Web (NOW) corpus produced at Brigham Young University in the United States. The NOW corpus is huge: it contains nearly 3.4 billion words from online newspapers and magazines in 20 countries from 2010 to the present. More data are being added every day.

The table below shows the frequencies per million words of “he” and “she” for Australia, New Zealand, Britain, the United States, Canada, India and Pakistan. I’ve used these frequencies to calculate the ratio of “he” to “she” for each of these countries.



The table below shows the number of times the words “he” and “she” appeared in online news and magazine articles published in the 20 countries included in the NOW corpus since 2010.



The data are a rich source for understanding how Australian newspapers project Australian stories and voices. You can download the full corpus here (scroll down to the link “Australia 2015/2016” to download the data). You can find the word frequency list I compiled for the Australian media corpus here.

If you would like to start a conversation on this topic, take a look at the data yourself and share your findings on Twitter using the hashtag #ozmediacorpus.