![Trending Articles on Technical and Non Technical topics](/images/trending_categories.jpeg)
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
Physics
Chemistry
Biology
Mathematics
English
Economics
Psychology
Social Studies
Fashion Studies
Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
How to create a data frame of the maximum value for each group in an R data frame using dplyr?
Sometimes subsetting of group wise maximum values is required while doing the data analysis and this subset of the data frame is used for comparative analysis. The main objective is to compare these maximums with each other or with a threshold value. In R, we can find the group wise maximum value by using group_by and slice functions in dplyr package.
Example
Consider the below data frame −
> x<-rep(c("S1","S2","S3","S4"),times=c(6,12,12,10)) > y<-1:40 > df<-data.frame(x,y) > head(df,20) x y 1 S1 1 2 S1 2 3 S1 3 4 S1 4 5 S1 5 6 S1 6 7 S2 7 8 S2 8 9 S2 9 10 S2 10 11 S2 11 12 S2 12 13 S2 13 14 S2 14 15 S2 15 16 S2 16 17 S2 17 18 S2 18 19 S3 19 20 S3 20
Loading dplyr package −
> library(dplyr)
Finding the data frame of maximum values using group_by and slice −
> Group_Maximum_df<-df%>%group_by(x)%>%slice(which.max(y)) > Group_Maximum_df # A tibble: 4 x 2 # Groups: x [4] x y <fct> <int> 1 S1 6 2 S2 18 3 S3 30 4 S4 40
In the above example,, we have a factor variable. Now let’s consider that instead of a factor variable, our group variable is represented with integer values. In this case as well, the procedure to find the data frame of maximum values will be same −
> a<-rep(c(1,2,3,4,5),times=c(5,10,10,5,10)) > b<-rep(c(25,23,21,24),times=c(10,10,10,10)) > df2<-data.frame(a,b) > head(df2,20) a b 1 1 25 2 1 25 3 1 25 4 1 25 5 1 25 6 2 25 7 2 25 8 2 25 9 2 25 10 2 25 11 2 23 12 2 23 13 2 23 14 2 23 15 2 23 16 3 23 17 3 23 18 3 23 19 3 23 20 3 23 > Group_Maximum_df2<-df2%>%group_by(a)%>%slice(which.max(b)) > Group_Maximum_df2 # A tibble: 5 x 2 # Groups: a [5] a b <dbl> <dbl> 1 1 25 2 2 25 3 3 23 4 4 21 5 5 24
Advertisements