NWO Spinoza laureate 2013 – Piek Vossen – announced his plans for the 2.5 million euros NWO Spinoza Prize he has received.
“Understanding of Language by Machines – an escape from the world of language” – Spinoza prize project Vossen: SPI 30-673 (2014–2019)
The goal of the Spinoza project “Understanding of language by machines” (ULM) is to develop computer models that can assign deeper meaning to language that approximates human understanding and to use these models to automatically read and understand text. Current approaches to natural language understanding consider language as a closed-world of relations between words. Words and text are however highly ambiguous and vague. People do not notice this ambiguity when using language within their social communicative context. This project tries to get a better understanding of the scope and complexity of this ambiguity and how to model the social communicative contexts to help resolving it.
The project is divided into 4 subprojects, each investigating a different aspect of assigning meaning:
- ULM-1: The borders of ambiguity: ULM-1 will explore the closed world of language as a system of word relations. The goal is to more properly define the problem and find the optimal solution given the vast volumes of textual data that are available. This project starts from the results obtained in the DutchSemCor project.
- ULM-2: Word, concept, perception and brain: ULM-2 will cross the borders of language and relate words and their meanings to perceptual data and brain activation patterns.
- ULM-3: Stories and world views as a key to understanding language: ULM-3 will consider the interpretation of text built up from words as a function of our ways of interacting with the changing world around us. We interpret changes from our world-views on the here and now and the future. Furthermore, we structure these changes as stories along explanatory motivations. This project builds on the results of the European project NewsReader.
- ULM-4: A quantum model of text understanding: ULM-4 is a technical project that investigates a new model of natural-language-processing. Current approaches are based on a pipeline architecture, in which the complete problem is divided in a series of smaller isolated tasks, e.g. tokenization, part-of-speech-tagging, lemmatisation, syntactic parsing, recognition of entities, detection of word meanings. In this new model, none of these tasks is decisive and the final interpretation is left to higher-order semantic and contextual models. This project also builds on the findings of previous European (KYOTO) and ongoing OpeNER and NewsReader) and national (BiographyNet) projects carried out at the VU University Amsterdam. The goal is to develop a new model of natural-language-processing in which text is interpreted in a combined top-down and bottom-up proces.