- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Find the combination of columns for correlation coefficient greater than a certain value in R
To find the combination of columns for correlation coefficient greater than a certain value, we would first need to create the correlation matrix and then melt the correlation with the help if melt function of reshape2 package. After that subset of the output will be taken based on the value of the correlation coefficient.
Check out the below examples to understand how it works.
Example 1
Following snippet creates a sample data frame −
x1<-rpois(20,1) x2<-rpois(20,10) x3<-rpois(20,5) x4<-rpois(20,2) df1<-data.frame(x1,x2,x3,x4) df1
The following dataframe is created −
x1 x2 x3 x4 1 1 8 3 2 2 2 5 6 1 3 0 3 3 2 4 1 8 4 3 5 2 9 7 0 6 0 9 5 1 7 0 13 6 5 8 3 9 2 2 9 0 11 5 3 10 1 11 6 2 11 0 15 0 2 12 1 6 7 3 13 0 9 4 1 14 1 11 6 1 15 1 5 6 2 16 0 8 5 1 17 2 9 5 1 18 0 17 5 4 19 1 7 6 0 20 1 12 6 4
To create correlation matrix of df1, add the following code to the above snippet −
x1<-rpois(20,1) x2<-rpois(20,10) x3<-rpois(20,5) x4<-rpois(20,2) df1<-data.frame(x1,x2,x3,x4) cor1_matrix<-cor(df1) cor1_matrix
Output
If you execute all the above given snippets as a single program, it generates the following output −
x1 x2 x3 x4 x1 1.0000000 -0.2873806 0.12162796 -0.31472199 x2 -0.2873806 1.0000000 -0.16970821 0.45119129 x3 0.1216280 -0.1697082 1.00000000 -0.02241285 x4 -0.3147220 0.4511913 -0.02241285 1.00000000
To load reshape2 package and find the combination of correlation coefficients that are greater than 0.30, add the following code to the above snippet −
library(reshape2) subset(melt(cor1_matrix),value>.30)
Output
If you execute all the above given snippets as a single program, it generates the following output −
Var1 Var2 value 1 x1 x1 1.0000000 6 x2 x2 1.0000000 8 x4 x2 0.4511913 11 x3 x3 1.0000000 14 x2 x4 0.4511913 16 x4 x4 1.0000000
Example 2
Following snippet creates a sample data frame −
y1<-rnorm(20) y2<-rnorm(20,5) y3<-rnorm(20,1.005) df2<-data.frame(y1,y2,y3) df2
The following dataframe is created −
y1 y2 y3 1 0.987216392 5.729841 1.6302391 2 0.784426157 4.229493 1.3783138 3 -0.444098876 3.623398 1.7947024 4 0.093496185 5.388854 0.7357072 5 -0.606812484 4.608422 1.5531116 6 0.681756392 4.502711 1.7351390 7 0.646009220 5.414941 1.4273596 8 0.418220626 6.227583 -0.4851824 9 -0.096372689 5.749269 -0.3193480 10 0.263341182 4.861265 1.8186878 11 -0.669565407 5.292873 1.4790937 12 -0.409141117 6.087335 1.8738509 13 -0.008184681 4.887777 1.8336940 14 1.147759554 5.431373 -0.5929404 15 -0.826403622 5.043522 0.3473174 16 -1.749526916 4.274688 0.4565382 17 -0.981464558 5.652843 2.0842843 18 1.414818984 5.136481 1.3521429 19 1.010931968 5.266047 1.7779003 20 0.674112034 5.497107 0.8404535
To create correlation matrix, add the following code to the above snippet −
y1<-rnorm(20) y2<-rnorm(20,5) y3<-rnorm(20,1.005) df2<-data.frame(y1,y2,y3) cor2_matrix<-cor(df2) cor2_matrix
Output
If you execute all the above given snippets as a single program, it generates the following output −
y1 y2 y3 y1 1.00000000 0.2162542 -0.03940615 y2 0.21625418 1.0000000 -0.30541902 y3 -0.03940615 -0.3054190 1.00000000
To find the combination of correlation coefficients that are less than 0.20, add the following code to the above snippet −
subset(melt(cor2_matrix),value<0.20)
Output
If you execute all the above given snippets as a single program, it generates the following output −
Var1 Var2 value 3 y3 y1 -0.03940615 6 y3 y2 -0.30541902 7 y1 y3 -0.03940615 8 y2 y3 -0.30541902