As part as the efforts to this project, the below resources and datasets have been created.

EveTAR Test Collection

– A Large-Scale Multi-Task Test Collection of Arabic Tweets. This is the first Arabic Test Collection for multiple information retrieval tasks in Twitter. It supports:

  • Event detection
  • Ad-hoc search
  • Timeline generation
  • Real-time summarization

– The paper titled EveTAR: A New Test Collection for Event Detection in Arabic Tweets provides an overview description to this collection.
– It has two versions – Version v1.0 which supports event detection only and version v2.0 which supports 4 tasks with 4 subsets.

Download EveTAR v1.0

Download EveTAR v2.0


Journalists Questions on Twitter

A dataset collected as part of the work done in the  paper titled What questions do journalists ask on Twitter.

Download ArQAT-JQ-Dataset-v1.0


Question Identification in Arabic Tweets

Collected as part of the work done for the paper titled Identification of Answer-Seeking Questions in Arabic microblogs.

Download ArQAT-QI-Dataset-v1.0


Multiple Social Platform Test Collection

– This collection contains annotations used for the paper titled  Building Bridges across Social Platforms: Answering Twitter Questions with Yahoo! Answers. The dataset consisting of 4 files also contains 177 aqweet ( tweets containing answerable questions) identifiers used for aqweet training , 85 aqweet identifiers used for validation purposes and another 100 aqweet identifiers used for test in the above mentioned work.
– For more details including usage details, please check the download link.

Download Dataset