Resources

Saffron data

  • Saffron ACL data: includes top 15 topics for each publication (based on the Saffron score).
  • Sample domain models: contains 3 domain models for Computer Science, Food and Agriculture, and the Biomedical domain.
  • Sample topical hierarchies: includes 3 topical hierarchies automatically constructed for Computational Linguistics, Finance, and Semantic Web.
  • Expert search evaluation: evaluation dataset for domain-specific expert search based on workshop program committees.

BitterCorpus – Bilingual IT Terminology Annotated Corpus

  • The dataset (generated in collaboration with HLT FBK) contains two annotated corpora produced to evaluate monolingual and bilingual domain-specific term extractors. To download the dataset, please visit this webpage.

PE²rr – PostEdited and ERRor annotated corpus

  • The PE²rr corpus contains machine translations, their post-edited versions and error annotations of the performed edit-operations. In addition, particular language-related issues are defined for each sentence where possible. The examples below illustrate the corpus for the English-Slovene translation direction. To download the dataset, please visit this webpage.

Polylingual WordNet

Subjunctive Mood Dataset

Example Sentences for Subjunctive Mood : Sentences which contain subjunctive mood. [Reference]