In this paper, we propose two sampling theories of rule discovery based on generality and accuracy. The first theory concerns the worst case: it extends a preliminary version of PAC learning, which represents a worst-case analysis for classification. In our analysis, a rule is defined as a probabilistic constraint of true assignment to the class attribute for corresponding examples, and we mainly analyze the case in which we try to avoid finding a bad rule. Effectiveness of our approach is demonstrated through examples for conjunction-rule discovery. The second theory concerns a distribution-based case: it represents the conditions that a rule exceeds pre-specified thresholds for generality and accuracy with high reliability. The idea is to assume a 2-dimensional normal distribution for two probabilistic variables, and obtain the conditions based on their confidence region. This approach has been validated experimentally using 21 benchmark data sets in the machine learning community against conventional methods each of which evaluates the reliability of generality. Discussions on related work are provided for PAC learning, multiple comparison, and analysis of association-rule discovery.
|出版ステータス||出版済み - 1月 2005|
!!!All Science Journal Classification (ASJC) codes