Welcome to ULD Webpage

The Unit for Linguistic Data is concerned with the creation, improvement and maintenance of linguistic data (also known as language resources) through a variety of methods. With the term linguistic data we refer to a range of data types that are of useful to researchers in linguistic and natural language processing. Principally, this can be split into four major categories: firstly, lexical data contains descriptions of words and their meanings, syntax and relations; secondly, corpora consist of collections of texts made for a particular purpose; thirdly, language description describe typological properties of language to enable comparative studies; finally. metadata about language resources and their availability.

As a primary research method, this group is focused on exploring the use of linked data technologies, that is, Linguistic Linked Open Data (LLOD) as a method of processing linguistic data. This is done through the development of several key tools and resources that use linked data as a key part of its mechanism. Firstly, the Naisc tool is a novel tool developed by the group for linking together resources of different kinds and has been applied to the task of linking lexicographical resources in the context of the ELEXIS project. Secondly, Teanga is a tool that enables the constructions of pipelines of NLP tools that can be composed and integrated through the use of linked data and standards for linguistic data, such as the OntoLex-Lemon standard developed in this project. Finally, ULD maintains and develops several resources for the discovery of resources of linguistic data, including the Linghub website as well as the Linked Open Data Cloud and its Linguistic Linked Open Data Subcloud. In the context of the Prêt-à-LLOD project, ULD is further exploring how the quality and availability of resources can be improved.

One of the major applications of linguistic data is the application of already developed NLP technologies to new languages and domains. As such, a strong point of this group’s work is on under-resourced languages and there is much ongoing work on the development of technologies for minority languages as well as an active collaboration with the Irish Department and the Moore Institute on the development of NLP techniques for historical languages, in particular Old Irish. Furthermore, we are working on expanding WordNet to many under-resourced languages by means of machine translation.