# Projects

## Application of fuzzy statistical methods in data mining

Olgierd Hryniewicz (Systems Research Institute) Olgierd.Hryniewicz >at> ibspan.waw.pl

**Description:**Statistical methods are widely used in data mining and knowledge exploration. They provide sound methodological background for the analysis of large data sets. However, their advantage over other, non-statistical, methods is visible when the mechanism of data generation is stable and identifiable. Unfortunately, the real data sets are never governed by such mechanisms. The data are imprecisely reported, come from inhomogeneous sources, and rarely are described by precisely defined mathematical models. The results of the application of statistical methods to such data cannot be interpreted precisely enough, e.g. in terms of classical statistical confidence intervals. Therefore, there is a need to use more flexible methodology that allows the usage of imprecise description of data and analyzed quantities. Fuzzy statistics seems to be an appropriate tool to cope with such problems. Among many possible specific subjects of investigation one can indicate the following: extraction of linguistically defined distinctive notions which can be subsequently used for the summarization of large inhomogeneous data sets, building models (e.g. regression models) with such notions used as model’s variables, testing imprecise hypotheses using inhomogeneous data sets, etc. Investigations should be focused not only on building mathematical models and decision support procedures, but on the construction of effective (i.e. applicable for large data sets) algorithms as well. Therefore, some intelligent computing methodology which allows for using heuristic algorithms has to be used.