How to remove rows from an R data frame based on frequency of values in grouping column?


To remove rows from an R data frame based on frequency of values in grouping column, we can follow the below steps −

  • First of all, create a data frame.
  • Then, remove rows based on frequency of values in grouping column using filter and group_by function of dplyr package.

Create the data frame

Let's create a data frame as shown below −

 Live Demo

> Group<-sample(c("I","II","III","IV"),20,replace=TRUE)
> Rank<-sample(1:10,20,replace=TRUE)
> df<-data.frame(Group,Rank)
> df

On executing, the above script generates the below output(this output will vary on your system due to randomization) −

  Group  Rank
1    IV 7
2     I 8
3    IV 2
4     I 9
5   III 9
6   IV  5
7   II  8
8  III  2
9  III  3
10   I  6
11  II  3
12  II  1
13  IV  7
14 III  4
15 III  5
16  IV  3
17  II  2
18 III  8
19   I  5
20 III  4

Removing rows from data frame based on frequencies in grouping column

Loading dplyr package and removing rows from df based on frequency of values based on Group column −

 Live Demo

> Group<-sample(c("I","II","III","IV"),20,replace=TRUE)
> Rank<-sample(1:10,20,replace=TRUE)
> df<-data.frame(Group,Rank)
> library(dplyr)
> df %>% group_by(Group) %>% filter(n()>4)
# A tibble: 12 x 2
# Groups: Group [2]
Group Rank
<chr> <int>
1 IV 7
2 IV 2
3 III 9
4 IV 5
5 III 2
6 III 3
7 IV 7
8 III 4
9 III 5
10 IV 3
11 III 8
12 III 4

Updated on: 13-Aug-2021

686 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements