Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
Data Mining Articles
Page 28 of 36
What is Grid Based Methods?
The grid-based clustering methods use a multi-resolution grid data structure. It quantizes the object areas into a finite number of cells that form a grid structure on which all of the operations for clustering are implemented. The benefit of the method is its quick processing time, which is generally independent of the number of data objects, still dependent on only the multiple cells in each dimension in the quantized space.An instance of the grid-based approach involves STING, which explores statistical data stored in the grid cells, WaveCluster, which clusters objects using a wavelet transform approach, and CLIQUE, which defines a ...
Read MoreWhat is a Chameleon?
Chameleon is a hierarchical clustering algorithm that uses dynamic modeling to decide the similarity among pairs of clusters. It was changed based on the observed weaknesses of two hierarchical clustering algorithms such as ROCK and CURE.ROCK and related designs emphasize cluster interconnectivity while neglecting data regarding cluster proximity. CURE and related design consider cluster proximity yet neglect cluster interconnectivity. In Chameleon, cluster similarity is assessed depending on how well-connected objects are inside a cluster and on the proximity of clusters. Especially, two clusters are combined if their interconnectivity is high and they are close together.It does not base on a ...
Read MoreHow efficient is the k-medoids algorithm on large data sets?
A classic k-medoids partitioning algorithm like PAM works efficiently for small data sets but does not scale well for huge data sets. It can deal with higher data sets, a sampling-based method, known as CLARA (Clustering Large Applications), can be used.The approach behind CLARA is as follows: If the sample is chosen in a fairly random manner, it must closely define the original data set. The representative objects (medoids) chosen will be similar to those that would have been selected from the entire data set. CLARA draws several samples of the data set, applies PAM on each sample, and returns ...
Read MoreWhat are the requirements of clustering in data mining?
There are the following requirements of clustering in data mining which are as follows −Scalability − Some clustering algorithms work well on small data sets including fewer than some hundred data objects. A huge database can include millions of objects. Clustering on a sample of a given huge data set can lead to partial results. Highly scalable clustering algorithms are required.Ability to deal with different types of attributes − Some algorithms are designed to cluster interval-based (numerical) information. However, applications can require clustering several types of data, including binary, categorical (nominal), and ordinal data, or a combination of these data ...
Read MoreHow can we further improve the efficiency of Apriori-based mining?
There are some variations of the Apriori algorithm that have been projected that target developing the efficiency of the original algorithm which are as follows −The hash-based technique (hashing itemsets into corresponding buckets) − A hash-based technique can be used to decrease the size of the candidate k-itemsets, Ck, for k > 1. For instance, when scanning each transaction in the database to create the frequent 1-itemsets, L1, from the candidate 1-itemsets in C1, it can make some 2-itemsets for each transaction, hash (i.e., map) them into the several buckets of a hash table structure, and increase the equivalent bucket ...
Read MoreWhat are the Web-based tools in OLAP?
There are the various web-based tools which are as follows −Arbor Essbase Web − This tool provides features as drilling up, down, across; slice and dice, and powerful reporting, all for OLAP. It also provides data entry, such as full multi-user concurrent write capabilities. Arbor Essbase is only a server product, no user package exists, thus assuring its own desktop client version market. The Web product does not restore administrative and development structures but it restores only user access for queries and updates.Information Advantage Web OLAP − This product uses a server-centric messaging architecture, composed of a powerful analytic engine ...
Read MoreWhat are the characteristics of OLAP?
The FASMI TestIt can represent the characteristics of an OLAP application in a specific method, without dictating how it should be performed.Fast − It defines that the system is targeted to produce most responses to users within about five seconds, with the understandable analysis taking no more than one second and very few taking more than 20 seconds.Independent research in the Netherlands has shown that end-users consider that a process has declined if results are not received with 30 seconds, and they are suitable to hit ‘ALT+Ctrl+Delete’ unless the system needs them that the report will take longer.Analysis − It ...
Read MoreWhat are Hierarchical Methods?
A hierarchical clustering technique works by combining data objects into a tree of clusters. Hierarchical clustering algorithms are either top-down or bottom-up. The quality of an authentic hierarchical clustering method deteriorates from its inability to implement adjustment once a merge or split decision is completed.The merging of clusters is based on the distance among clusters. The broadly used measures for the distance between clusters are as follows, where mi is the mean for cluster Ci, ni is the number of points in Ci, and |p – p’| is the distance among two points p and p'.Types of Hierarchical Clustering MethodsThere ...
Read MoreHow does the discordancy testing work?
A statistical discordancy test analysis two hypotheses; a working hypothesis and a different hypothesis. A working hypothesis, H, is a statement that the entire data set of n objects comes from an initial distribution model, F, i.e., H: oi Î F, where i = 1, 2, n.The hypothesis is retained if there is no statistically important evidence supporting its rejection. A discordancy test checks whether an object oi is essentially large (or small) regarding the distribution F. Different test statistics have been proposed for use as a discordancy test, based on the available knowledge of the data.Suppose that some statistic ...
Read MoreWhat are the methods of clustering?
There are various methods of clustering which are as follows −Partitioning Methods − Given a database of n objects or data tuples, a partitioning method assembles k partitions of the information, where each partition defines a cluster, and k < n. It can allocate the data into k groups, which can satisfy the following necessity −Each group must include a minimum of one object.Each object should apply to accurately one group.Given k, the number of partitions to construct, a partitioning method makes an initial partitioning. It then uses an iterative relocation method which attempts to improve the partitioning by transforming ...
Read More