Home  ::  Global Statistics  ::  Adjective Noun Pairs  ::  Languages  ::  About  ::  People  ::  MMCommons


With the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset, a great novel dataset was introduced to the computer vision and multimedia research community. To maximize the benefit for the research community and utilize its potential, this dataset has to be made accessible by tools allowing to search for target concepts within the dataset and mechanism to browse images and videos of the dataset.

This website presents means of accessibility for the YFCC100m dataset. This includes a global analysis of the dataset and an online browser to explore and investigate subsets of the dataset in real-time.

The Browser

To enable easy and quick access for the type of queries, which define a specific subset of the YFCC100m dataset, we present the YFCC100m Browser, which is designed to filter and explore the entire dataset of 100 million images and videos in real-time. Subsets of the complete dataset can be retrieved by a straightforward keyword search and reviewed directly.
Due to our technological choices, despite the datasets vast size, performance is high enough to view query results in matters of seconds, enabling a fluid browsing experience.

Search & Refine

Given a user query the browser retrieves the subset of images and videos matching the query and provides previews of images in form of thumbnails. Each item is linked to its associated Flickr page, where further information such as comments can be found. In addition, a set of statistics for the retrieved subset is generated dynamically, including a tag-cloud, the global distribution, user participation and time-line of items. With this very vital information it is possible to get a first overview of the subsets as defiend by a user query and identify biases or get a quick impression of the quality of the associated images and videos.


To allow high accessibility to the YFCC100m dataset and scalability, the online browser is using the Google AppEngine environment, a framework allowing to setup scalable web applications on Google's infrastructure.
The backend, realizing search and query mechanism of the browser, is running on Google BigQuery. This includes the retrieval, aggregation and temporary storage of the search results. Statistics of search and retrieval results are dynamically gathered and computed on the server side, while visualizations in form of charts are rendered clientside with Javascript.

If you like our work with the YFCC100m Browser, please cite our paper. [hide]