How to extract unique combinations of two or more variables in an R data frame?


An R data frame can have a large number of categorical variables and these categorical form different combinations. For example, one value of a variable could be linked with two or more values of the other variable. Also, one categorical variable can have all unique categories. We can find this unique combination for as many variables as we want and it can be done with the help of unique function.

Example

Consider the below data frame −

> x1<-rep(c(1,2,3,4,5),times=4)
> x2<-rep(c("A","B","C","D"),each=5)
> x3<-letters[1:20]
> x4<-rep(c(5,10,15,20),times=c(2,8,6,4))
> df<-data.frame(x1,x2,x3,x4)
> df
x1 x2 x3 x4
1 1 A a 5
2 2 A b 5
3 3 A c 10
4 4 A d 10
5 5 A e 10
6 1 B f 10
7 2 B g 10
8 3 B h 10
9 4 B i 10
10 5 B j 10
11 1 C k 15
12 2 C l 15
13 3 C m 15
14 4 C n 15
15 5 C o 15
16 1 D p 15
17 2 D q 20
18 3 D r 20
19 4 D s 20
20 5 D t 20

Finding the unique combinations of x2, and x4 −

> unique(df[c("x2","x4")])
x2 x4
1 A 5
3 A 10
6 B 10
11 C 15
16 D 15
17 D 20

Finding the unique combinations of x1, x3, and x4 −

> unique(df[c("x1","x3","x4")])
x1 x3 x4
1 1 a 5
2 2 b 5
3 3 c 10
4 4 d 10
5 5 e 10
6 1 f 10
7 2 g 10
8 3 h 10
9 4 i 10
10 5 j 10
11 1 k 15
12 2 l 15
13 3 m 15
14 4 n 15
15 5 o 15
16 1 p 15
17 2 q 20
18 3 r 20
19 4 s 20
20 5 t 20

Finding the unique combinations of x1, x2, and x4 −

> unique(df[c("x1","x2","x4")])
x1 x2 x4
1 1 A 5
2 2 A 5
3 3 A 10
4 4 A 10
5 5 A 10
6 1 B 10
7 2 B 10
8 3 B 10
9 4 B 10
10 5 B 10
11 1 C 15
12 2 C 15
13 3 C 15
14 4 C 15
15 5 C 15
16 1 D 15
17 2 D 20
18 3 D 20
19 4 D 20
20 5 D 20

Updated on: 11-Aug-2020

8K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements