Select rows with any infinite value (Inf or -Inf)-CodePudding

How do I subset rows from a data frame which have at least one infinite value (Inf or -Inf)?

Here is an example data frame:

my_data <- data.frame(column1 = c(Inf, 5, 3,4,5), 
                      column2 = c(1, Inf, -Inf, NA, 33))

I tried:

my_data[rowSums(is.infinite(my_data)) > 0, ]

But got the error:

Error in is.infinite(my_data) : default method not implemented for type 'list'

Which is suprising, as the is.na() equivalent works fine:

my_data[rowSums(is.na(my_data)) > 0, ]

I was able to find methods to change Inf values to NA but this is not quite what I am looking for, I only want to display all rows that contain and Inf or -Inf rather than replace them with NA.

EDIT: If there is method of doing this for a data frame with many columns, without individually typing out each column that would be ideal.

Any help would be appreciated!

CodePudding user response：

It seems that is.infinite cannot apply on a data.frame. An alternative is sapply:

my_data[rowSums(sapply(my_data, is.infinite)) > 0, ]

#   column1 column2
# 1     Inf       1
# 2       5     Inf
# 3       3    -Inf

With dplyr,you could use if_any or if_all to apply is.infinite to a selection of columns and combine the results into a single logical vector.

library(dplyr)

my_data %>%
  filter(if_any(where(is.numeric), is.infinite))

CodePudding user response：

From the documentation, help("is.infinite"), last paragraph of section Details:

All three functions are generic: you can write methods to handle specific classes of objects, see InternalMethods.

So a solution is to write .list and .data.frame methods for is.finite, is.infinite and is.nan. But beware, if you use a system without these methods available, you will get the error in the question.

is.finite.list <- function(x) {
  x[] <- lapply(x, base::is.finite)
  x
}
is.finite.data.frame <- function(x) {
  x[] <- lapply(x, base::is.finite)
  x
}
is.infinite.list <- function(x) {
  x[] <- lapply(x, base::is.infinite)
  x
}
is.infinite.data.frame <- function(x) {
  x[] <- lapply(x, base::is.infinite)
  x
}
is.nan.list <- function(x) {
  x[] <- lapply(x, base::is.nan)
  x
}
is.nan.data.frame <- function(x) {
  x[] <- lapply(x, base::is.nan)
  x
}

my_data <- data.frame(column1 = c(Inf, 5, 3,4,5), 
                      column2 = c(1, Inf, -Inf, NA, 33))

is.infinite(my_data)
#>   column1 column2
#> 1    TRUE   FALSE
#> 2   FALSE    TRUE
#> 3   FALSE    TRUE
#> 4   FALSE   FALSE
#> 5   FALSE   FALSE

is.finite(my_data)
#>   column1 column2
#> 1   FALSE    TRUE
#> 2    TRUE   FALSE
#> 3    TRUE   FALSE
#> 4    TRUE   FALSE
#> 5    TRUE    TRUE

is.nan(my_data)
#>   column1 column2
#> 1   FALSE   FALSE
#> 2   FALSE   FALSE
#> 3   FALSE   FALSE
#> 4   FALSE   FALSE
#> 5   FALSE   FALSE

# The question code line throwing the error
my_data[rowSums(is.infinite(my_data)) > 0, ]
#>   column1 column2
#> 1     Inf       1
#> 2       5     Inf
#> 3       3    -Inf

^{Created on 2022-08-05 by the reprex package (v2.0.1)}

CodePudding user response：

Something like this should work

library(tidyverse)

my_data <- data.frame(column1 = c(Inf, 5, 3,4,5), 
                      column2 = c(1, Inf, -Inf, NA, 33))


my_data
#   column1 column2
# 1     Inf       1
# 2       5     Inf
# 3       3    -Inf
# 4       4      NA
# 5       5      33

my_data %>% 
  filter(is.infinite(column1) | is.infinite(column2))
#   column1 column2
# 1     Inf       1
# 2       5     Inf
# 3       3    -Inf

If you have too many columns to individually name, you could use if_any() (credit to @RuiBarradas), like so:

# Across all columns
my_data %>% 
  filter(if_any(everything(), is.infinite)) 

# Across a range of columns
my_data %>% 
  filter(if_any(column1:column2, is.infinite))