Social Stories Dataset

Download dataset. (2018)

This dataset is part of the “Benchmark of Visual Storytelling in Social Media”, a joint work between UNL and BBC R&D, in the context of the H2020 COGNITUS project.

The dataset, first developed for the TRECVID 2018 “Social-media Video Storytelling Linking” task, comprises content (images + videos + metadata) from two major events, Edinburgh Festival 2016 and Tour de France 2016. For each event, a set of news storylines are provided, with the data collection covering the unfolding of these event stories.

Please, cite our paper if you use the dataset:

Gonçalo Marcelino, David Semedo, André Mourão, Saverio Blasi, Marta Mrak, and Joao Magalhaes, A Benchmark of Visual Storytelling in Social Media, ICMR 2019.


News Quality dataset

Download dataset. (3 November 2016)

Online news editors ask themselves the same question many times: what is missing in this news article to go online? This is not an easy question to be answered by computational linguistic methods. In this dataset, we address this important question and characterise the constituents of news article editorial quality. More specifically, we identify 14 aspects related to the content of news articles.


The dataset comprises 500 news articles, fully annotated with 14 aspects defining a linguistic benchmark for assessing the quality of online news articles.

Please, cite our paper if you use the dataset:

I. Arapakis, F. Peleja, B. Berkant and J. Magalhaes, Linguistic Benchmarks of Online News Article Quality, ACL 2016.

Novaemötions dataset

Download dataset

This dataset contains the facial expression images captured using the novaemötions game. It contains over 40,000 images, labeled with the challenged expression and the expression recognized by the game algorithm, augmented with labels obtained through crowdsourcing.

BBC cross-media dataset

This is a dataset used for cross-media data analysis. It contains a set of news articles with the text corpus and the corresponding image illustrations. The dataset was used in a Web news classification task.

Download it here.

If you use this dataset, please cite this article:

Web news categorization using a cross-media document graph
José Iria, Fabio Ciravegna, João Magalhães
Proceedings of the ACM international conference on Image and Video Retrieval (ACM CIVR 2009).