Why use Support and Confidence in data mining?

Data Mining Database Data Structure

Support is a substantial measure because a rule that has very low support can appear easily by chance. A low support rule is also feasible to be tedious from a business viewpoint because it cannot be profitable to enhance items that users seldom purchase together.

An association rule is an implication description of the form X→Y where X and Y are disjoint itemsets, i.e., $\mathrm{X\cap\:Y=\phi}$. The durability of an association rule can be computed in terms of its support and confidence. Support decides how to provide a rule that is accessible to a given data set, while confidence decides how frequently items in Y occurs in transactions that include X.

Confidence measures the accuracy of the inference produced by a rule. For a given rule form X→Y, the larger the confidence, the more acceptable it is for Y to be present in transactions that include X. Confidence also supports an estimate of the conditional probability of Y given X.

Association analysis results must be executed with care. The inference produced by an association rule does not essentially signify causality. Rather than it suggests a powerful co-appearance relationship among elements in the antecedent and consequent of the rule. Causality needed knowledge about the causal and effect attributes in the information and generally contains relationships appearing over time.

The association rule mining problem can be stated as follows −

Association Rule Discovery − Given a set of transactions T, discover some rules having support ≥ minsup and confidence ≥ minconf , where minsup and minconf are the equivalent support and confidence thresholds.

A brute-force method for mining association rules is to calculate the support and confidence for each applicable rule. This method is intensely expensive because there are exponentially several rules that can be copied from a data set.

An original step toward enhancing the implementation of association rule mining algorithms is to decouple the support and confidence requirements. If the itemset is odd, therefore all six candidate rules can be pruned directly without having to calculate their confidence values.

Thus, a common strategy offshore by several association rule mining algorithms is to decompose the issues into two major subtasks −

Frequent Itemset Generation − The objective is to discover some itemsets that need a threshold. These itemsets are known as frequent itemsets.

Rule Generation − The objective is to extract some high-confidence rules from the frequent itemsets discovered in the previous step. These rules are known as strong rules. The computational requirements for frequent itemset generation are frequently more costly than those of rule generation.

Ginni

Updated on: 11-Feb-2022

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started