Invited online keynote talk on “Framing situations in the Dutch Language“, Schultink lecture LOT Winter School, January 20, 2022 from 16:30-17:30.
Click here to join the meeting (passcode: 131548)
We use language to tell stories and respond to situations in the world. We may describe these situations in many different ways, which often reflects our perspective. Although there are many corpora capturing language, hardly any of these also represent the actual situations that the texts refer to, let alone provide indications of which texts refer to the same situation. Event coreference corpora could serve this purpose as they are annotated for mentions of the same event. However, available event coreference corpora are very small and they exhibit hardly any ambiguity, i.e. there typically is one referent for each expression, nor variation, i.e. there are only one or few expressions for each referent (Ilievski et al., 2016; Postma et al., 2016). Not having any texts that refer to the same or similar situations, or not knowing which texts do, makes it difficult to investigate the different ways in which people refer to the same situation. It also hampers the development of systems to automatically resolve (cross-document) event coreference and to understand and develop technology that detects how events are framed.
Imagine you want to create a text corpus to represent language use around murder? How to proceed? You can use public corpora such as the Gigaword corpus (Napoles et al., 2012) and search for texts using keywords. How many murders will you find and will you find all murders? Referring to events as murder is actually already subjective and may miss situations that some people describe differently. Even if you get a substantial amount of texts about murders, we still do not know which texts make reference to the same murder. Such referential anchoring is however a prerequisite to relate differences in framing these murders. How are we going to find all typical and different ways to frame a murder without referential grounding?
The project Framing Situations in the Dutch Language (http://dutchframenet.nl) tries to tackle this problem using the data-to-text method described in Vossen et al. (2018b), which compiles massive text data (so-called reference texts) in different languages that is referentially grounded to specific event instances represented as so-called microworlds. We not only ground these texts but also automatically disambiguate mentions of these events in texts following a one-sense-per-event-type principle. Furthermore, we automatically derive the dominant vocabulary and the dominant FrameNet frames (Baker et al., 2003) for different types of events.
We provide our first results of data acquisition, together with the first data and software release. We introduce the notion of cross-lingual referential corpora. These corpora consist of texts that make reference to exactly the same incidents and to similar types of incidents. The referential grounding allows us to analyse the framing of these incidents in different languages and across different texts. During the project, we will use the automatically generated data to study linguistic framing as a phenomenon, build framing resources such as extensions to FrameNet, lexicons, corpora, and FrameNet annotation software as well as coreference resolution software. We expect to capture larger variation in framing compared to traditional approaches for building such resources.