How to convert a string in an R data frame to NA?

We often see mistakes in data collection processes and these mistakes might lead to incorrect results of the research. When the data is collected with mistakes, it makes the job of analyst difficult. One of the situations, that shows the data has mistakes is getting strings in place of numerical values. Therefore, we need to convert these strings to NA in R so that we can proceed with our intended analysis.


Consider the below data frame −

> x1<-rep(c(1,3,6,7,5,2,"XYZ",12,4,5),times=2)
> x2<-rep(c(67,"XYZ",45,32,52),each=4)
> df<-data.frame(x1,x2)
> df
 x1 x2
1 1 67
2 3 67
3 6 67
4 7 67
5 5 XYZ
6 2 XYZ
8 12 XYZ
9 4 45
10 5 45
11 1 45
12 3 45
13 6 32
14 7 32
15 5 32
16 2 32
17 XYZ 52
18 12 52
19 4 52
20 5 52

Converting all XYZ’s to NA −

> df[df=="XYZ"]<-NA
> df
     x1  x2
 1   1   67
 2   3   67
 3   6   67
 4   7   67
 5   5  <NA>
 6   2  <NA>
 7 <NA> <NA>
 8  12  <NA>
 9   4   45
10   5   45
11   1   45
12   3   45
13   6   32
14   7   32
15   5   32
16   2   32
17 <NA>  52
18  12   52
19   4   52
20   5   52

Let’s have a look at one more example −

> ID<-c("Class",2:20)
> ID<-c("Class",1:19)
> Group<-rep(c("Class",2,3,4,5),times=4)
> df1<-data.frame(ID,Group)
> df1
     ID  Group
 1 Class Class
 2  1   2
 3  2   3
 4  3   4
 5  4   5
 6  5 Class
 7  6   2
 8  7   3
 9  8   4
10  9   5
11 10 Class
12 11   2
13 12   3
14 13   4
15 14   5
16 15 Class
17 16   2
18 17   3
19 18   4
20 19   5
> df1[df1=="Class"]<-NA
> df1
ID Group
1 <NA> <NA>
2 1 2
3 2 3
4 3 4
5 4 5
6 5 <NA>
7 6 2
8 7 3
 9 8 4
10 9 5
11 10 <NA>
12 11 2
13 12 3
14 13 4
15 14 5
16 15 <NA>
17 16 2
18 17 3
19 18 4
20 19 5

Updated on: 12-Aug-2020


Kickstart Your Career

Get certified by completing the course

Get Started