Internship on Information Mining

The Unit for Natural Language Processing at the Digital Enterprise Research Institute (DERI: of the National University of Ireland, Galway invites applications for an internship in information mining. The successful applicant will have the following responsibilities:

1. Research: topic clustering. We are currently extracting over 27,000 topics from scientific publications using NLP techniques, and need a reliable and efficient technique to cluster these so that they are more manageable in our expert-finding application’s user interface. This could be hierarchical topic clustering (super topics and sub topics in a tree structure) or some other suitable way determined by research.

2. Development: crawling. We are currently identifying further sources of both documents and metadata describing the documents, and need infrastructure to be setup and run to effectively crawl this information for subsequent information extraction. This will involve the setting up and administration of databases, metadata stores and open source crawling tools, as well as the development of some custom screen-scraping tools.

3. Any other tasks required of them.

The work will be carried out in the context of an ongoing industrial research collaboration, and internally with DERI’s Unit for Information Mining and Retrieval. Internships are aimed at undergraduate and postgraduate students, with a duration of 3-12 months and will be remunerated at EUR 1000 per month (EUR 1200 for postgraduate students).

DERI is a leading research institute in semantic technologies that offers a stimulating, dynamic and multi-cultural research environment, excellent ties to research-groups worldwide, close collaboration with industrial partners and up-to-date infrastructure and resources. The DERI Unit for Natural Language Processing ( has a focus on applied research in ontology-based information extraction, semantic-level text mining and the use of linguistic and semantic methods in information retrieval. The unit develops methods for the efficient application of NLP tools in combination with domain semantics as specified in ontologies, thesauri and other knowledge organisation systems for relevant use cases. Research is carried out in close cooperation with the DERI Unit for Information Mining and Retrieval ( in the context of the DERI Semantic Information Mining stream ( as well as with other DERI units.

Please send an expression of interest, preferably with links to relevant previous study and/or research experience, by August 1st to:

Dr. Fergal Monaghan
fergal dot monaghan at deri dot org