As part as the efforts to this project, the below resources and datasets have been created.
EveTAR Test Collection
– A Large-Scale Multi-Task Test Collection of Arabic Tweets. This is the first Arabic Test Collection for multiple information retrieval tasks in Twitter. It supports:
- Event detection
- Ad-hoc search
- Timeline generation
- Real-time summarization
– The paper titled EveTAR: A New Test Collection for Event Detection in Arabic Tweets provides an overview description to this collection.
– It has two versions – Version v1.0 which supports event detection only and version v2.0 which supports 4 tasks with 4 subsets.
Journalists Questions on Twitter
A dataset collected as part of the work done in the paper titled What questions do journalists ask on Twitter.
Question Identification in Arabic Tweets
Collected as part of the work done for the paper titled Identification of Answer-Seeking Questions in Arabic microblogs.
Multiple Social Platform Test Collection
– This collection contains annotations used for the paper titled Building Bridges across Social Platforms: Answering Twitter Questions with Yahoo! Answers. The dataset consisting of 4 files also contains 177 aqweet ( tweets containing answerable questions) identifiers used for aqweet training , 85 aqweet identifiers used for validation purposes and another 100 aqweet identifiers used for test in the above mentioned work.
– For more details including usage details, please check the download link.