How to remove everything before values starting after underscore from column values of an R data frame?


If a column in an R data frame contain string values that are separated with an underscore and stretches the size of the column values that also contain common values then it would be wise to remove underscore sign from all the values at once along with the values that is common. This will help us to read the data properly as well as analysis will become easy. For this purpose, we can use gsub function

Consider the below data frame −

Example

 Live Demo

set.seed(191)
ID<-c("ID_1","ID_2","ID_3","ID_4","ID_5","ID_6","ID_7","ID_8","ID_9","ID_10","ID_11","ID_12","ID_13","ID_14","ID_15","ID_16","ID_17","ID_18","ID_19","ID_20")
Salary<-sample(20000:50000,20)
df1<-data.frame(ID,Salary)
df1

Output

   ID   Salary
1  ID_1  33170
2  ID_2  22747
3  ID_3  42886
4  ID_4  22031
5  ID_5  45668
6  ID_6  32584
7  ID_7  34779
8  ID_8  20471
9  ID_9  38689
10 ID_10 29660
11 ID_11 49664
12 ID_12 24284
13 ID_13 36537
14 ID_14 37693
15 ID_15 30265
16 ID_16 36004
17 ID_17 48247
18 ID_18 20750
19 ID_19 27400
20 ID_20 20553

Removing everything before and including underscore sign from ID values in column ID −

Example

df1$ID<-gsub("^.*\_","",df1$ID)
df1

Output

   ID Salary
1  1  48769
2  2  26002
3  3  37231
4  4  24437
5  5  43311
6  6  47494
7  7  21029
8  8  28069
9  9  41108
10 10 29363
11 11 23371
12 12 25898
13 13 42434
14 14 22210
15 15 48969
16 16 21640
17 17 36175
18 18 21210
19 19 43374
20 20 29367

Let’s have a look at another example −

Example

 Live Demo

Group<-c("GRP_1","GRP_2","GRP_3","GRP_4","GRP_5","GRP_6","GRP_7","GRP_8","GRP_9","GRP_10","GRP_11","GRP_12","GRP_13","GRP_14","GRP_15","GRP_16","GRP_17","GRP_18","GRP_19","GRP_20")
 Ratings<-sample(0:10,20,replace=TRUE)
df2<-data.frame(Group,Ratings)
df2

Output

 Group Ratings
1  GRP_1  6
2  GRP_2  9
3  GRP_3  7
4  GRP_4 10
5  GRP_5 10
6  GRP_6  9
7  GRP_7  9
8  GRP_8  3
9  GRP_9  2
10 GRP_10 0
11 GRP_11 3
12 GRP_12 7
13 GRP_13 6
14 GRP_14 10
15 GRP_15 1
16 GRP_16 3
17 GRP_17 10
18 GRP_18 2
19 GRP_19 9
20 GRP_20 0

Removing everything before and including underscore sign from GRP values in column Group −

Example

df2$Group<-gsub("^.*\_","",df2$Group)
df2

Output

   Group Ratings
1   1     4
2   2     8
3   3     7
4   4     0
5   5    10
6   6    10
7   7     5
8   8     4
9   9     3
10 10     7
11 11     4
12 12     4
13 13     3
14 14    10
15 15     7
16 16     2
17 17     3
18 18     8
19 19     9
20 20     5

Updated on: 19-Oct-2020

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements