“Why linguists are needed: The severe limitations of big data analysis of linguistic corpora”

George Lakoff

University of California, Berkeley

The Berkeley MetaNet Project was funded for three years by IARPA, the Intelligence branch of the U.S. Department of Defense, on an open source basis. IARPA wanted a completely automated machine learning approach to analyzing the conceptual metaphors in their vast corpora of documents. Luckily, they also put together an ace team of Berkeley linguists and psycholinguists from California campuses.

This talk will go over why the big data statistical methods by themselves were hopeless. The Linguistics Group, on the other hand, used computational methods to set up a wiki database of many hundreds of conceptual metaphor mappings, over a hundred frames, many dozens of image schemas, and a very simple embodied construction grammar (ECG) parser incorporating Karen Sullivan’s insights on the way conceptual metaphor functions in grammar.

We did find the ability to process large corpora extremely useful. We also found that if you took the corpora processing input and applied even a simple metaphorical ECG parser and links to the cascades of relationships in the wiki database, we began to get some interesting analyses. But it took a great team of linguists — and lots of serious linguistic research — to get even reasonable partial analyses at all.

The talk will discuss details to give you a feel for why linguists are needed for serious analyses of linguistic data.

George Lakoff is Richard and Rhoda Goldman Distinguished Professor of Cognitive Science and Linguistics at the University of California, Berkeley. He is one of the founders of conceptual metaphor theory and of the field of cognitive linguistics. His numerous books and articles concern the “metaphors we live by”, the nature of conceptual categories and frames, and how both of these structure abstract domains such as mathematics, philosophy and political reasoning. The present lecture relates to his research at the International Computer Science Institute and UC Berkeley on embodied cognition and the Neural Theory of Language.

The lecture is organized by the Language Use and Cognition chair group of the VU Faculty of Humanities and was sponsored by the Network Institute and the Spinoza Prize project “Understanding Language by Machines” of Prof. Dr. Piek Vossen in the Computational Lexicology & Terminology Lab of the VU Faculty of Humanities.


