I am a computational linguist: a cognitive scientist who studies language and knowledge using computational methods. More specifically, I would describe myself as as data-driven formal semanticist: I use evidence from corpora, psychology and the neural sciences, analyzed using statistical and machine learning methods, to test hypothesis formulated in mathematically precise ways, and / or to develop new such theories. Thus, my work on anaphora --e.g., on salience, the role of lexical knowledge, and on reference to abstract objects--has been driven by the analysis of corpora (which, in most cases, we created) and of disagreements in corpus annotation--most recently, using the Phrase Detectives game-with-a-purpose to collect such data. My work on the organization and acquisition of conceptual knowledge--most recently, testing Pustejovsky's hypothesis about polysemy in collaboration with Yuan Tao and Andrew Anderson--has involved using machine learning techniques to acquire evidence about commonsense from corpora and brain data. Finally, my work on the semantics of dialogues has been fuelled first by evidence gathered from the TRAINS corpora, and more recently from work on the Bielefeld Toy Airplane corpus, collected by Hannes Rieser and his collaborators. I am also heavily involved in the application of such methods to real-world problems--particular interests include deception detection and Arabic NLP. I am a full professor at the School of Computing and Electronic Engineering, University of Essex, and a member of the University's Language and Computation group and Digital Lifestyles Centre. I am also a supervisor in the IGGI Doctoral training centre in Intelligent Games and Game Intelligence and a PI in the forthcoming Centre for Human Rights and Information Technology in the Era of Big Data
My current projects include the SENSEI project on using discourse information to support summarization of conversations including online forums; the Phrase Detectives game-with-a-purpose (using crowdsourcing to create resources for anaphora resolution); the Concepts in Brain and Language project in collaboration with the University of Trento, devoted to studing conceptual representations by using a combination of brain imaging and techniques for acquiring concepts from corpora, with its spinoffs ADAM and Deep Relations; the Deception in Text project with Tommaso Fornaciari; several projects on using NLP to support detecting human rights violations, including a KTP with Minority Rights Group on human rights violations in Iraq; the Brain and Emotions project, also in collaboration with Trento, on studying emotions using brain data; and the study of several aspects of anaphora resolution using the BART toolkit. Past projects include ARRAU (studying difficult cases of anaphora); the GALATEAS EU project on using HLT techniques to facilitate the analysis of query logs, e.g., in digital libraries such as Bridgeman's; the 2007 Johns Hopkins workshop ELERFED (using lexical and encyclopedic knowledge for entity disambiguation), GNOME (generating referring expressions), and LiveMemories (using information extraction to help sharing knowledge).
If you are interested in studying computational linguistics, whether from a cognitive or engineering perspective, consider our group! I am happy to supervise third year-, MSc- and PhD-level projects in Information Extraction/Text Mining (particularly coreference and Arabic IE), spoken dialogue interaction with intelligent environments (as in this example), and crowdsourcing. My research and my ongoing and past projects are described here