How to find the last non-missing value in each column of a data.table object?


To find the last non-missing value in each column of a data.table object, we can use lapply function along with tail function for NA values.

For example, if we have a data.table object called DT then we can find the last non-missing value in each column of DT by using the below given command −

DT[,lapply(.SD,function(x) tail(x[!is.na(x)],1))]

Example 1

Following snippet creates a data.table object −

library(data.table)
x1<-sample(c(NA,1,2),20,replace=TRUE)
x2<-sample(c(NA,5,10),20,replace=TRUE)
x3<-sample(c(NA,100,200),20,replace=TRUE)
DT1<-data.table(x1,x2,x3)
DT1

The following data.table object is created −

     x1 x2 x3
1:   2  5 200
2:  NA  5 200
3:   1  5 100
4:  NA NA 100
5:   1 10 200
6:  NA 10 100
7:  NA NA 200
8:   2 10 100
9:   1 10 100
10:  1  5 200
11:  1  5  NA
12: NA  5  NA
13:  2  5 100
14: NA  5 100
15: NA 10 100
16:  2  5 200
17: NA  5  NA
18: NA NA  NA
19:  2 10 200
20:  1 NA 100

To find the last non-missing value in each column of DT1, add the following code to the above snippet −

DT1[,lapply(.SD,function(x) tail(x[!is.na(x)],1))]

Output

If you execute all the above given snippets as a single program, it generates the following output −

   x1 x2 x3
1: 1 10 100

Example 2

Following snippet creates a data.table object −

y1<-sample(c(NA,rnorm(2)),20,replace=TRUE)
y2<-sample(c(NA,rnorm(2)),20,replace=TRUE)
y3<-sample(c(NA,rnorm(2)),20,replace=TRUE)
DT2<-data.table(y1,y2,y3)
DT2

The following data.table object is created −

       y1         y2          y3
1:  -0.7819176  0.5451545 -1.7242281
2:  -0.4305358  NA         NA
3:  -0.7819176  0.5451545  NA
4:  -0.4305358  NA         0.3926004
5:   NA         0.2671401 -1.7242281
6:  -0.7819176  0.2671401  0.3926004
7:  -0.7819176  NA         NA
8:  -0.4305358  0.5451545 -1.7242281
9:  -0.4305358  NA        -1.7242281
10: -0.7819176  0.5451545  NA
11: -0.4305358  NA         NA
12:  NA         0.5451545  0.3926004
13: -0.4305358  NA         0.3926004
14: -0.4305358  0.5451545 -1.7242281
15:  NA         0.2671401  0.3926004
16:  NA         0.5451545 -1.7242281
17:  NA         NA         NA
18: -0.4305358  NA         NA
19: -0.7819176  0.2671401  NA
20: -0.7819176  NA        -1.7242281

To find the last non-missing value in each column of DT2, add the following code to the above snippet −

DT2[,lapply(.SD,function(x) tail(x[!is.na(x)],1))]

Output

If you execute all the above given snippets as a single program, it generates the following output −

y1 y2 y3
1: -0.7819176 0.2671401 -1.724228

Example 3

Following snippet creates a data.table object −

z1<-sample(c(NA,round(runif(2,2,10),2)),20,replace=TRUE)
z2<-sample(c(NA,round(runif(2,1,2),2)),20,replace=TRUE)
z3<-sample(c(NA,round(runif(2,10,115),2)),20,replace=TRUE)
DT3<-data.table(z1,z2,z3)
DT3

The following data.table object is created −

     z1    z2    z3
1:  8.94  NA    71.40
2:  NA    NA    NA
3:  8.94  NA    NA
4:  8.94  NA    71.40
5:  4.09  1.48  NA
6:  4.09  NA    NA
7:  NA    1.38  71.40
8:  8.94  NA    NA
9:  4.09  NA    NA
10: 8.94  NA    NA
11: 4.09  NA    71.40
12: 8.94  1.48  107.44
13: 8.94  1.38  NA
14: 4.09  1.38  NA
15: 8.94  1.38  107.44
16: NA    1.48  71.40
17: NA    NA    107.44
18: NA    1.38  71.40
19: 4.09  NA    NA
20: 4.09  1.48  71.40

To find the last non-missing value in each column of DT3, add the following code to the above snippet −

DT3[,lapply(.SD,function(x) tail(x[!is.na(x)],1))]

Output

If you execute all the above given snippets as a single program, it generates the following output −

    z1    z2   z3
1: 4.09 1.48 71.4

Updated on: 12-Nov-2021

495 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements