Projects
Reformulating multilingual search engine queries using clustering and classification of documents, logs and semantic resources
Mieczysław A. Kłopotek (Institute of Computer Science) Mieczyslaw.Klopotek >at> ipipan.waw.pl
Description: The usefulness of search engine responses heavily depends on the understanding of user intentions. Therefore, much of research work has been devoted to support (interactive, with minimal burden for the user) query reformulation for web search using among others query expansion (Mitra et al., 1998), query substitution (Jones et al., 2006), implicit relevance feedback from users (e.g. seeking similar queries in the log based on cosine similarity of TF-IDF vectors representing retrieved documents (Francisco et al., 2008), or based on document click behaviour (Anick, 2003)). In this research it has to be checked whether or not in large-scale document collections, with the inherent topical and linguistic diversity the effectiveness of query reformulation may be improved by an expanded processing, including:
- viewing documents in the context of larger groups, via computation of local instead of global TF-IDF,
- seeking similarities between queries not on the grounds of feature vectors of individual documents but rather of their groups (either human generated classes or automatically extracted clusters), (Ciesielski et al., 2008)
- taking into account queries and responses in different (natural) languages,
- taking into account semantic resources.