Projects

Reformulating multilingual search engine queries using clustering and classification of documents, logs and semantic resources

Mieczysław A. Kłopotek (Institute of Computer Science) Mieczyslaw.Klopotek >at> ipipan.waw.pl

Description: The usefulness of search engine responses heavily depends on the understanding of user intentions. Therefore, much of research work has been devoted to support (interactive, with minimal burden for the user) query reformulation for web search using among others query expansion (Mitra et al., 1998), query substitution (Jones et al., 2006), implicit relevance feedback from users (e.g. seeking similar queries in the log based on cosine similarity of TF-IDF vectors representing retrieved documents (Francisco et al., 2008), or based on document click behaviour (Anick, 2003)). In this research it has to be checked whether or not in large-scale document collections, with the inherent topical and linguistic diversity the effectiveness of query reformulation may be improved by an expanded processing, including:
  • viewing documents in the context of larger groups, via computation of local instead of global TF-IDF,
  • seeking similarities between queries not on the grounds of feature vectors of individual documents but rather of their groups (either human generated classes or automatically extracted clusters), (Ciesielski et al., 2008)
  • taking into account queries and responses in different (natural) languages,
  • taking into account semantic resources.
As part of the research a formulation of reformulation quality assessment measure will be sought and according to it experimental comparison of alternative approaches is to be performed.

 Mitra M., Singhal A., Buckley C. Improving automatic query expansion. Proc. 21st Annual international ACM SIGIR Conference on Research and Development in information Retrieval (Melbourne, Australia, August 24 - 28, 1998). SIGIR '98. ACM, New York, NY, 206-214, 1998.

 Jones R., Rey B., Madani O., Greiner W. Generating query substitutions. In Proceedings of the 15th international Conference on World Wide Web (Edinburgh, Scotland, May 23 - 26, 2006). WWW '06. ACM, New York, NY, 387-396, 2006.

 Francisco A., Baeza-Yates R., Oliveira A.: Clique Analysis of Query Log Graphs Source: SPIRE, Springer, Melbourne (2008)

 Anick P. Using terminological feedback for Web search refinement: a log-based study. Proc. 26th Annual International ACM SIGIR Conf. on Research and Development in Information Retrieval (Toronto, Canada). SIGIR '03. ACM, New York, NY, 88-95, 2003.

 Ciesielski K., Kłopotek M.A., Wierzchoń, S.T. Term Distribution-Based Initialization of Fuzzy Text Clustering. In: Foundations of Intelligent Systems. Lecture Notes in Computer Science 4994/2008, Springer-Verlag, Toronto, Canada 2008, 278-287, (ISMIS'2008)

You are here: Home Projects Reformulating multilingual search engine queries using clustering and classification of documents, logs and semantic resources