Soft methods in data mining and statistical reasoning

Przemysław Grzegorzewski (Faculty of Mathematics and Information Science)    pgrzeg >at>

Description: Statistics delivers well-known and recognized methodology useful in decision making and exploring various areas of human activity like engineering, science, biology, medicine, economy, social sciences etc. However, applied statistics, contrary to mathematical statistics, starts from the real data which may cause problems before any statistical reasoning can be performed. It is so because classical statistical method and procedures usually have been constructed for the ideal models which appear too rigid for the real-life situations. Such ideal models generally assume homogeneous data sets that consist of precise observations well defined and univocally characterized. Classical statistical procedures also assume precise requirements, well defined steering parameters and so on. Unfortunately, the output of the real-life experiments and observations are very seldom so perfect. This situation is apparent especially if the data are reported by humans which are often not very precise, use vague terms and express their observations rather by linguistic description than a formal way. It is also so common that the decision-maker’s requirements are more qualitative than quantitative and abound of impression.
To cope with the above mentioned problems fuzzy methods, possibilistic reasoning, different numerical techniques, genetic algorithms and other approaches were incorporated into applied statistics. All these methods, sometimes reported as soft methods, are extremely useful both in data mining as in the statistical reasoning. Actually, they are helpful in extracting significant information from vast databases, aggregation of the accessible information and later information processing. Soft methods deliver convenient tools for modeling imprecision and for constructing such statistical procedures that successfully get along with vague data, imprecise hypotheses, requirements etc. On the other hand intelligent computing enables the construction of the more efficient procedures (both data mining algorithms and approaches as the inferential procedures). Thus it seems that soft methods together with the classical statistical reasoning based on probability theory give us a powerful framework for information processing and management of uncertainty.
The objective of the project includes following tasks:

  • Hypotheses testing admitting imprecision on different levels like: data, hypotheses, requirements, decisions.
  • Statistical learning (mainly classification and clustering) involving imprecision, conflicts and ignorance.

A starting point of the research will be a study of the relevant literature, cf., e.g., (Couso and Dubois, 2009),(González-Rodríguez et al., 2006), (Grzegorzewski, 2000), (Grzegorzewski, 2006), (Grzegorzewski, 2009a), (Grzegorzewski, 2009 b), (Hüllermeier, 2008), (Hüllermeier and Vanderlooyb, 2010).

  Couso I., Dubois D. On the variability of the concept of variance for fuzzy random variables, IEEE Trans. on Fuzzy Syst. 17 (2009) 1070-1080.

  Gonzalez-Rodriguez G., Colubi A., Gil M.A., A fuzzy representation of random variables: An operational tool in exploratory analysis and hypothesis testing, Computational Statistics and Data Analysis, 51 (2006) 163 – 176..

  Grzegorzewski P., Testing statistical hypotheses with vague data, Fuzzy Sets and Systems, 112 (2000), 501-510.

  Grzegorzewski P., The coefficient of concordance for vague data, Computational Statistics and Data Analysis, 51 (2006), 314-322.

  Grzegorzewski P., Kendall's correlation coefficient for vague preferences, Soft Computing, 13 (2009), 1055–1061.

  Grzegorzewski P., k-sample median test for vague data, International Journal of Intelligent Systems, 24 (2009), 529–539.

  Hüllermeier E., Brinker C. Learning valued preference structures for solving classification problems, Fuzzy Sets and Syst. 159(2008) 2337-2352.

  Hüllermeier E., Vanderlooyb S. Combining predictions in pairwise classification: An optimal adaptive voting strategy and its relation to weighted voting, Pattern Recognition, 43 (2010), 128-142.

You are here: Home Projects Soft methods in data mining and statistical reasoning