Show and Tell Demonstration

Title: A latent semantic retrieval system for personal photos with sparse user annotation

Date and Location:

Wednesday, April 22, 15:00 - 17:30, Location: Show and Tell Area B

Presented by

Yi-sheng Fu, Jya-cheng Hu , Chia-yu Wan, and Lin-shan Lee

Description

In this proposal we present a user-friendly latent semantic retrieval system for personal photos with sparse user annotation. Only 10% of the photos need to be annotated manually by speech or text, while all photos can be effectively retrieved using high-level semantic queries in words (e.g. who, what, where, when). This is because we use low-level image features to derive the relationships among photos, but train semantic models using Probabilistic Latent Semantic Analysis (PLSA) based on fused image/speech/text features to analyze the “topics” of the photos. Speech/text annotations can be very sparse. Only a few words regarding one or two semantic categories (e.g. who or where) are needed for only 10% of the photos. The sparse text/speech annotations serve as the user interface for the whole personal photo archive, while other photos not annotated are automatically related by fused feature semantics using PLSA. The technical content of the system has been written as a paper (paper number 4529) accepted for poster presentation in ICASSP 2009.

Content-based image retrieval based on low-level image features is very successful but not very attractive for personal photos, because users prefer high-level semantic descriptions of photos that use words as indices or queries, such as who, where, when, what (objective/events) and so on. But the latter is not attractive either if it requires manual annotation of each individual photo. The problem is difficult even if the users can annotate photos with speech when the photo is taken, because the query and its relevant photos may use different sets of words; for example, the annotation may describe location (where), but the user may look for a person (who). In other words, both annotation and query are typically free-form and vary significantly. Assume that photo annotation can be formulated into six categories: who, what (object and event), when, where, and others. When labeling a photo, users typically select only one or two categories. As a result, related photos may not be labeled using similar terms (e.g. some may be labeled by where and some by who), and the relationships among terms in different categories cannot be trained using latent topics. All these problems were reasonably solved in the system proposed here.


©2016 Conference Management Services, Inc. -||- email: webmaster@icassp09.com -||- Last updated Thursday, February 12, 2009