Invited consultant of the project: “Deep models of semantic knowledge” of the Bulgarian WordNet



Invited consultant and member of the research team of the project: “Deep models of semantic knowledge” by Kiril Simon and Petya Osenova of the Bulgarian WordNet.

The high-quality semantic management of texts is one of the central factors in the area of Semantic Web, Big Data, Internet of Things, etc. where various pieces of information are communicated between machine and humans. Semantic management is an interdisciplinary area of research. It includes language models as well as mathematical and computational methods on the interaction between the linguistic and encyclopaedic knowledge in lexicons and ontologies together with their contextual usage in texts. The main goal of the project is to study and create adequate linguistic models with deep mathematical methods over the semantic knowledge in lexicons and texts.
As a key element of the high-quality semantic management we consider the Word Sense Disambiguation which will be used as an experimental approach for testing the designed models.

The objectives are as follows:
a) To linguistically model the important aspects of difficulty in identification, combination and representation of the semantic knowledge, encoded in lexicons and realized in corpora.
b) To mathematically model some sure features in the graph-based and deep neural network algorithms that would help for the representation of the semantic knowledge and for scaling its identification, combination and extraction in WSD.
c) Integration of deep neural network models within graph-based approaches to WSD.
d) Creation of test suites (benchmarks) for the evaluation of the developed formal models.

The hypotheses are as follows:

a) The simultaneous exploration of lexicons and corpora will improve the quality of the required semantic knowledge.
b) The deep neural network models will complement the achievements of the knowledge graph-based ones, especially in the areas where it is difficult to identify relations in advance as well as in global and often highly ‘disturbed’ connectedness provided by large volumes of texts.
c) A model will be developed which improves over the current WSD for Bulgarian and potentially for other languages.