I am a research fellow at Insight Centre for Data Analytics at the Data Science Institute (DSI) and Adjunct Lecturer Adjunct Lecturer at the School of Computer Science @NUI Galway. I am working in the Unit for Natural Language Processing (UNLP) under the supervision of Dr Paul Buitelaar, where my main research topic focuses on terminology and ontology translation with statistical as well as neural machine translation (SMT/NMT). I defended my PhD thesis with the title: “Machine Translation of Domain-Specific Expressions within Ontologies and Documents”.
I studied the German Language at the University of Ljubljana, Slovenia, where I defended my Diploma thesis “Named Entity Recognition for German and Slovene” under the supervision of Dr Stojan Bračič and Dr Špela Vintar. In 2009 I obtained my master’s degree in Computational linguistics at the Ruhr University in Bochum, Germany. My thesis under the supervision of Prof. Dr Ralf Klabunde dealt with the extraction of semantic relations from the Slovene national corpus. After my study, I worked for Lionbridge as a developer of language technologies for Slovene, and for the Slovenian Project “Communication in Slovene”.
At Insight@NUIG (previously known as DERI) I was working on the challenges on the multilinguality of the language resources within different projects.
Research Topics: Statistical Machine Translation, Information Extraction, Computational Linguistics, Natural Language Processing
- Co-supervision of Bharathi Raja Asoka Chakravarthi on Machine Translation for under-resourced languages and Sina Ahmadi on cross-lingual data linking
- Teaching Assistant / Instructor for the Natural Language Processing module of the MSc in Data Analytics master’s program at NUI Galway (2016- )
- Organising Multilingualism at the intersection of Knowledge Bases and Machine Translation (MomenT) Workshop:
- Invited talk at Meet Central Europe (2018) on “The Neural Age of Machine Translation”, Budapest, Hungary
- Local organiser of Language, Data and Knowledge (LDK) 2017, Galway, Ireland
- Invited talk at “Translation Technology Terminology Conference (2014)” on Statistical Machine Translation and Terminology.
During my studies, several translation and conversational demos were deployed:
- Marvin, a conversational chatbot with major depression disorder detection.
- GrumpyBot, a sequence to sequence deep learning chatbot using linked data.
- Insight META System, SMT system accessible through an API request, supporting all official European Union languages. (CURRENTLY OFFLINE)
- ASISTENT, SMT/NMT system translating between English and morphological rich South Slavic languages (Slovene, Serbian, Croatian).
- OTTO, SMT system for multilingual enhancement of ontologies. (CURRENTLY OFFLINE)
- IRIS, an SMT/NMT system translating from English to (less-resourced) Irish and v.v.
- TeTra, a system for extracting and translating specific vocabulary (CURRENTLY OFFLINE)
- BitterCorpus – Bilingual IT Terminology Annotated Corpus: annotated corpora for the evaluation of monolingual and bilingual domain-specific term extraction (with HLT FBK).
- PE²rr (PostEdited and ERRor annotated corpus) covers machine translations, their post-edited versions and error annotations of the performed edit-operations.
- Polylingual WordNet extends WordNet for 23 languages by automatic translation and is released as both OntoLex JSON-LD as well as in the Global WordNet LMF. This resource is available for re-use under the Creative Commons Attribution 4.0 License. To download the dataset, please visit the Polylingual WordNet webpage.