With the Yahoo Flickr Creative Commons 100 Million (YFCC100m) dataset, a great novel dataset was introduced to the computer vision and multimedia research community. To maximize the benefit for the research community and utilize its potential, this dataset has to be made accessible by tools allowing to search for target concepts within the dataset and mechanism to browse images and videos of the dataset.
This website presents means of accessibility for the YFCC100m dataset. This includes a global analysis of the dataset and an online browser to explore and investigate subsets of the dataset in real-time.
The Browser
To enable easy and quick access for the type of queries, which define a specific
subset of the YFCC100m dataset, we present the YFCC100m Browser, which is designed to filter
and explore the entire dataset of 100 million images and videos in real-time.
Subsets of the complete dataset can be retrieved by a straightforward keyword search and
reviewed directly.
|
Search & RefineGiven a user query the browser retrieves the subset of images and videos matching the query and provides previews of images in form of thumbnails. Each item is linked to its associated Flickr page, where further information such as comments can be found. In addition, a set of statistics for the retrieved subset is generated dynamically, including a tag-cloud, the global distribution, user participation and time-line of items. With this very vital information it is possible to get a first overview of the subsets as defiend by a user query and identify biases or get a quick impression of the quality of the associated images and videos. |
Architecture
To allow high accessibility to the YFCC100m dataset and scalability, the online browser
is using the Google AppEngine environment,
a framework allowing to setup scalable web applications on Google's infrastructure.
|
![]() Multimedia Analysis and Data Mining http://madm.dfki.de |