UNLP provides open access  to the following data sets and resources:

Knowledge Extraction

  • Saffron ACL data: includes top 15 topics for each publication (based on the Saffron score).
  • Sample domain models: contains 3 domain models for Computer Science, Food and Agriculture, and the Biomedical domain.
  • Sample topical hierarchies: includes 3 topical hierarchies automatically constructed for Computational Linguistics, Finance, and Semantic Web.
  • Expert search evaluation: evaluation dataset for domain-specific expert search based on workshop program committees.

Data Linking

Machine Translation

Social Media Analysis

  • Tweet data annotated on each of four emotion dimensions: Valence, Arousal, Dominance and Surprise. The resource contains 2019 tweets annotated both on a 5-point ordinal scale and as tweet pairs annotated as pair-wise comparisons. [Reference]
  • Ekman annotated tweets: A set of 360 tweets containing common emoji annotated by many annotators on the presence or absence of each of the 6 emotion categories identified by Ekman: Joy, Sadness, Surprise, Anger, Fear and Disgust. Annotations were conducted with the emoji removed from the tweets. [Reference]