Current Research

  • Understanding of Language by Machines – an escape from the world of language – “Spinoza-price projects Vossen: SPI 30-673 (2014-2020)
    • The goal of the Spinoza project “Understanding of language by machines” (ULM) is to develop computer models that can assign deeper meaning to language that approximates human understanding and to use these models to automatically read and understand text. Current approaches to natural language understanding consider language as a closed-world of relations between words. Words and text are however highly ambiguous and vague. People do not notice this ambiguity when using language within their social communicative context. This project tries to get a better understanding of the scope and complexity of this ambiguity and how to model the social communicative contexts to help resolving it. The project is divided into 4 subprojects, each investigating a different aspect of assigning meaning:

      • ULM-1: The borders of ambiguity: ULM-1 will explore the closed world of language as a system of word relations. The goal is to more properly define the problem and find the optimal solution given the vast volumes of textual data that are available. This project starts from the results obtained in the DutchSemCor project.
      • ULM-2: Word, concept, perception and brain: ULM-2 will cross the borders of language and relate words and their meanings to perceptual data and brain activation patterns.
      • ULM-3: Stories and world views as a key to understanding language: ULM-3 will consider the interpretation of text built up from words as a function of our ways of interacting with the changing world around us. We interpret changes from our world-views on the here and now and the future. Furthermore, we structure these changes as stories along explanatory motivations. This project builds on the results of the European project NewsReader.
      • ULM-4: A quantum model of text understanding: ULM-4 is a technical project that investigates a new model of natural-language-processing. Current approaches are based on a pipeline architecture, in which the complete problem is divided in a series of smaller isolated tasks, e.g. tokenization, part-of-speech-tagging, lemmatisation, syntactic parsing, recognition of entities, detection of word meanings. In this new model, none of these tasks is decisive and the final interpretation is left to higher-order semantic and contextual models. This project also builds on the findings of previous European (KYOTO) and ongoing OpeNER and NewsReader and national (BiographyNet) projects carried out at the VU University Amsterdam. The goal is to develop a new model of natural-language-processing in which text is interpreted in a combined top-down and bottom-up proces.
  • Framing situations in the Dutch Language: NWO Vrije Competitie Geesteswetenschappen (2019-2023)
    • Project leader of Dutch Framenet where the project’s objectives are 1) to create a unique data set where similar situations are framed by many different sources and texts using a newly developed data-to-text method, 2) to capture the variation in framing these situations in Dutch and other languages, 3) to capture semantic-pragmatic factors underlying the usage of different frames for similar situations, and 4) to develop semantic frame and role annotation software. An additional concrete outcome of this project is a Dutch FrameNet contributing to the renowned Berkeley Multilingual FrameNet project, which assesses the cross-linguistic validity of frames and investigates crosslinguistic variation in framing. The insights, resources and technologies created by this project provide new possibilities for (industrial) data analysts and researchers from the Humanities and Social Sciences.
  • CLARIAH-PLUS: Nationale roadmap grootschalige onderzoeksfaciliteiten NWO (2019-2023)
    • Member of the Kernteam and Technical Officer.
  • Make Robots Talk: VU University (2018–>>)
    • Project leader of a project where CLTL recently bought a robot and now want to plug in our natural language processing technology so that the robot can respond to people in an intelligent way and turns into a wise bot.
  • Document Forensics: The Network Institute Academy Projects 2018 (2018-2019)
    • Project coordinator of DF in which we will apply methods to extract the relevant concepts (e.g., the name of suppliers, or the type of relationship between companies, executive management) from unstructured (e.g., news) as well as semi-structured (e.g., contracts and financial) documents to populate knowledge graphs and link them to publicly available knowledge graphs.
  • OpeNER (“Open Polarity Enhanced Named Entity Recognition”): 7th EU Framework project ICT-296451 (2012-2014)
    • Project partner of OpeNER. Currently there is plethora of companies offering online Sentiment Analysis (SA) services; the majority are generic and monolingual. SA is a complex field at the edge of current state of the art in NLP. Two key elements are Lexical Resources and Named Entity Recognition and Classification (NERC). Both elements allow for the measure of “what” about “whom”. Building tools and resources for Opinion Mining (OM) is an expensive.These basic technologies for OM that are a fundamental to market qualification for enterprises offering OM Services are costly to develop. OpeNER proposes the reuse and repurpose of existing lexical resources, Linked Data and the broader Social Internet. OpeNER will focus on ES, NL, FR, IT, DE and EN, and create a generic multilingual graduated sentiment data pool reusing existing language resources (WordNets, Wikipedia) and automatic techniques. The Sentiment Lexicon will supplement popular or proprietary Lexicons. The Lexicon will be expressed in a new mark-up format. Multilingualism and cultural skew in OM increases complexity. The sentiment values will be culturally normalised to allow a “like-for-like” comparison. Tools for extension to other languages and domains will be provided. Fine-grained NERC which is critical in SA will be addressed. NERC will be done by “Wikification” and Linked Data. Extensions to the generic OM system will be created for validation in the Tourism domain with partner SMEs and an End User Advisory Board. OpeNER will also create an online development portal and community to host data, libraries, APIs and services. Task focused on implementing models to ensure the long-term self-sustainability and options for Open Licensing are included. It will provide base qualifying technologies and a means for continued development and extension to other languages and domains, freeing SMEs to concentrate on their efforts providing innovative solutions to meet market needs rather than expensive development of core technologies.
  • CLARIAH: Nationale roadmap grootschalige onderzoeksfaciliteiten NWO (2015-2019)
    • Member of the Kernteam and Technical officer within WP3 of CLARIAH responsible for the theme Interoperability.
  • QuPiD2From text to Deep Data: AAA Data Science Program (2015-2019)
    • Project coordinator of this project whichs develops a model that provides a representation of things in the (real or assumed) world and allows us to indicate the perspective of different sources on them. In other words, we aim to provide a framework that can represent what is said about a topic, a person or an event and how this is said in and by various sources, making it possible to place alternative perspectives next to each other. We develop software to detect these perspectives in texts and represent the output according to our formal model which is called GRaSP (Grounded Representation and Source Perspective). GRaSP is an overarching model that provides the means to: (1) represent instances (e.g. events, entities) and propositions in the (real or assumed) world, (2) to relate them to mentions in text using the Grounded Annotation Framework, and (3) to characterize the relation between mentions of sources and targets by means of perspective-related annotations such as attribution, factuality and sentiment.
  • Vossen University Research Fellows: VU University (2014-ongoing)
    • VU University Research Fellowship (URF) is a programme developed for a select number of internationally renowned scientists at VU University Amsterdam. It is a token of appreciation and a public tribute to the university’s most excellent scientists for their extraordinary research performances. These scientists will be entitled to reward the best student of their choice with a University Research Fellow which will carry their own name.
  • Deep models of semantic knowledge (DemoSem), National Science Fund Ministry of Education and Science, Bulgaria (2017-2019)
    • Member of the research team of this project which main goal is to study and create adequate linguistic models with deep mathematical methods over the semantic knowledge in lexicons and texts. As a key element of the high-quality semantic management we consider the Word Sense Disambiguation which will be used as an experimental approach for testing the designed models.
  • e-Humanities Amsterdam (2013-ongoing)
    • Project Leader on behalf of the VU University in the Centre for Digital Humanities Amsterdam: a collaboration between the University of Amsterdam, the VU University of Amsterdam and the Royal Netherlands Academy of Arts and Sciences. Within the field of Digital Humanities, researchers and students focus on digital or digitized sources and methods of research. Digital data concerning language, art, music, literature and media allow researchers to discovers new patterns, concepts and motives, eventually raising new research questions.†The Centre for Digital Humanities Amsterdam facilitates so-called embedded research projects, in which research questions from the humanities are approached by using techniques and concepts out of the fields of Digital Humanities. In these short and intensive projects, which last between 6 and 12 months, researchers collaborate with private partners and deliver proof-of-concepts. The centre preferably initiates embedded research projects in the context of larger projects in which expertise from the humanities and industry is brought together.
  • OpenSourceWordnet: grant Taalunie (2013-2014) and ongoing
    • Global WordNet Grid: a GWA Project (2006-ongoing)
      • In 2006 Vossen launched the Global Wordnet Grid: the building of a complete free worldwide wordnet grid. This grid will be build around a shared set of concepts, such as the Common Base Concepts used in many wordnet projects. These concepts will be expressed in terms of Wordnet synsets and SUMO definitions. People from all language communities are invited to upload synsets from their language to the Grid. Gradually, the Grid will then be represented by all languages. The Grid will be available to everybody and will be distributed completely free.
    • Global Wordnet Association (2000-ongoing)
      • Vossen is Founder and President of the Global WordNet Association. He founded GWA (with Christiane Fellbaum of Princeton University) in 2000 as a public and non-commercial organization that provides a platform for discussing, sharing and connecting wordnets for all languages in the world. For more information see: