I am Professor of English Language and Linguistics at Lancaster University. I am also the Director of the ESRC's Centre for Corpus Approaches to Social Science (CASS). My work is in the area of corpus linguistics - the study of language using millions, sometimes billions, of words of evidence. I have been collecting and analyzing such data - called corpora - for nearly forty years. I have worked with a range of languages including Arabic, Chinese, English, French and Nepali and with various industrial partners including, Blackberry, British Telecom, Cambridge University Press, IBM and Nokia.
Corpora have revolutionized the study of language. Rather than being reliant our intuitions about language use, we can now study language, as it is used by speakers or writers now. We can also study language from the past, or even gain new insights into ancient 'dead' languages where a written record of them exists.
Corpora have vastly improved language processing technologies, dictionaries, language teaching materials and language tests, to name but a few of the many benefits they have yielded. Yet they have also opened a new window into discourse - how we talk about or construct ideas in language. It is largely in the study of discourse that I am, with colleagues from a wide range of social sciences, exploring the capacity that corpora have for transforming our approach to the study of a wide range of issues including climate change, financial reporting, hate speech and religious identities.