UNLP provides open access to the following data sets and resources:
- Saffron ACL data: includes top 15 topics for each publication (based on the Saffron score).
- Sample domain models: contains 3 domain models for Computer Science, Food and Agriculture, and the Biomedical domain.
- Sample topical hierarchies: includes 3 topical hierarchies automatically constructed for Computational Linguistics, Finance, and Semantic Web.
- Expert search evaluation: evaluation dataset for domain-specific expert search based on workshop program committees.
- Linked Open Data Cloud: Visualization of Linked Data datasets that have been published in Data Hub .
- BitterCorpus – Bilingual IT Terminology Annotated Corpus: annotated corpora for the evaluation of monolingual and bilingual domain-specific term extraction (with HLT FBK).
- PE²rr (PostEdited and ERRor annotated corpus) covers machine translations, their post-edited versions and error annotations of the performed edit-operations.
- Polylingual WordNet extends WordNet for 23 languages by automatic translation and is released as both OntoLex JSON-LD as well as in the Global WordNet LMF. This resource is available for re-use under the Creative Commons Attribution 4.0 License. To download the dataset, please visit the Polylingual WordNet webpage.
Social Media Analysis
- Example Sentences for Subjunctive Mood : Sentences which contain subjunctive mood. [Reference]
- Sentences from different domains tagged as suggestion and non-suggestion. Published in Emnlp 2015, *Sem2016
- Tweet data annotated on each of four emotion dimensions: Valence, Arousal, Dominance and Surprise. The resource contains 2019 tweets annotated both on a 5-point ordinal scale and as tweet pairs annotated as pair-wise comparisons. [Reference]
- Ekman annotated tweets: A set of 360 tweets containing common emoji annotated by many annotators on the presence or absence of each of the 6 emotion categories identified by Ekman: Joy, Sadness, Surprise, Anger, Fear and Disgust. Annotations were conducted with the emoji removed from the tweets. [Reference]