Home > OS >  Remove all rows from data.table if there is any infinite value
Remove all rows from data.table if there is any infinite value

Time:01-29

In the toy example below, I want to delete all rows that have Inf or Nan values. In my actual data.table, there are much more columns.

Group<-c("A","B","C","D","E","F","G")
 LRR <- c(Inf, 1,2,3,-Inf,4, 5)
 LRR.var <- c(NaN, Inf, 3, -Inf, -Inf, 6,7)
 data<-data.table(cbind(Group, LRR, LRR.var))
 data

 Group  LRR  LRR.var
 A      Inf  NaN
 B      1    Inf
 C      2    3
 D      3   -Inf
 E     -Inf -Inf
 F      4    6
 G      5    7

To delete all the rows in one go, I am using the following code but getting an error -

Code -

data[!is.finite(data)]

Error -

Error: default method not implemented for type 'list'

Can someone suggest a method to delete all rows with any NaN or Inf values from data.table in one go?

I do not want to use code like the one below as in such a case I have to name all the columns one by one to check for infinite values.

data[is.finite(data$LRR) & is.finite(data$LRR.var), ]

CodePudding user response:

The columns are character class, thus is.infinite or is.finite doesn't work as it expects numeric columns. According to ?is.infinite

is.infinite returns a vector of the same length as x the jth element of which is TRUE if x[j] is infinite (i.e., equal to one of Inf or -Inf) and FALSE otherwise. This will be false unless x is numeric or complex. Complex numbers are infinite if either the real or the imaginary part is.

> str(data)
Classes ‘data.table’ and 'data.frame':  7 obs. of  3 variables:
 $ Group  : chr  "A" "B" "C" "D" ...
 $ LRR    : chr  "Inf" "1" "2" "3" ...
 $ LRR.var: chr  "Inf" "Inf" "3" "-Inf" ...
> is.finite(data$LRR)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
> is.infinite(data$LRR)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE

We may need to convert to numeric before applying. As the data is a data.table, we may use data.table methods to subset

library(data.table)
data <- type.convert(data, as.is = TRUE)
data[data[, Reduce(`&`,
     lapply(.SD, is.finite)), .SDcols = is.numeric]]

-output

    Group LRR LRR.var
1:     C   2       3
2:     F   4       6
3:     G   5       7

Note: The reason we get all character columns is because of creation of matrix from cbind (default is cbind.matrix) as matrix handle only a single class, it is converted to character class based on the column 'Group'. Instead, create the data.table or data.frame directly

data <- data.table(Group, LRR, LRR.var)
> str(data)
Classes ‘data.table’ and 'data.frame':  7 obs. of  3 variables:
 $ Group  : chr  "A" "B" "C" "D" ...
 $ LRR    : num  Inf 1 2 3 -Inf ...
 $ LRR.var: num  Inf Inf 3 -Inf -Inf ...

Another option is if_all with filter from dplyr

library(dplyr)
data %>% 
  filter(if_all(where(is.numeric), is.finite))
   Group LRR LRR.var
1:     C   2       3
2:     F   4       6
3:     G   5       7

CodePudding user response:

In order to avoid conversion from numeric to char when create your datatable you can use cbind.data.frame instead of cbind:

Group<-c("A","B","C","D","E","F","G")
LRR <- c(1, Inf,2,3, -Inf,4, 5)
LRR.var <- c(Inf, Inf, 3, -Inf, -Inf, 6,7)
data<-data.table(cbind.data.frame(Group, LRR, LRR.var))
str(data)

Output:

Classes ‘data.table’ and 'data.frame':  7 obs. of  3 variables:
 $ Group  : chr  "A" "B" "C" "D" ...
 $ LRR    : num  1 Inf 2 3 -Inf ...
 $ LRR.var: num  Inf Inf 3 -Inf -Inf ...
 - attr(*, ".internal.selfref")=<externalptr> 

Then a posible solution could be convert infinite numbers to NA and finally drop_na from table:

is.na(data) <- sapply(data, is.infinite)
data %>% drop_na()

Output:

   Group LRR LRR.var
1:     C   2       3
2:     F   4       6
3:     G   5       7
  • Related