How do I subset rows from a data frame which have at least one infinite value (Inf
or -Inf
)?
Here is an example data frame:
my_data <- data.frame(column1 = c(Inf, 5, 3,4,5),
column2 = c(1, Inf, -Inf, NA, 33))
I tried:
my_data[rowSums(is.infinite(my_data)) > 0, ]
But got the error:
Error in is.infinite(my_data) : default method not implemented for type 'list'
Which is suprising, as the is.na()
equivalent works fine:
my_data[rowSums(is.na(my_data)) > 0, ]
I was able to find methods to change Inf
values to NA
but this is not quite what I am looking for, I only want to display all rows that contain and Inf
or -Inf
rather than replace them with NA
.
EDIT: If there is method of doing this for a data frame with many columns, without individually typing out each column that would be ideal.
Any help would be appreciated!
CodePudding user response:
It seems that is.infinite
cannot apply on a data.frame. An alternative is sapply
:
my_data[rowSums(sapply(my_data, is.infinite)) > 0, ]
# column1 column2
# 1 Inf 1
# 2 5 Inf
# 3 3 -Inf
With dplyr
,you could use if_any
or if_all
to apply is.infinite
to a selection of columns and combine the results into a single logical vector.
library(dplyr)
my_data %>%
filter(if_any(where(is.numeric), is.infinite))
CodePudding user response:
From the documentation, help("is.infinite")
, last paragraph of section Details:
All three functions are generic: you can write methods to handle specific classes of objects, see InternalMethods.
So a solution is to write .list
and .data.frame
methods for is.finite
, is.infinite
and is.nan
. But beware, if you use a system without these methods available, you will get the error in the question.
is.finite.list <- function(x) {
x[] <- lapply(x, base::is.finite)
x
}
is.finite.data.frame <- function(x) {
x[] <- lapply(x, base::is.finite)
x
}
is.infinite.list <- function(x) {
x[] <- lapply(x, base::is.infinite)
x
}
is.infinite.data.frame <- function(x) {
x[] <- lapply(x, base::is.infinite)
x
}
is.nan.list <- function(x) {
x[] <- lapply(x, base::is.nan)
x
}
is.nan.data.frame <- function(x) {
x[] <- lapply(x, base::is.nan)
x
}
my_data <- data.frame(column1 = c(Inf, 5, 3,4,5),
column2 = c(1, Inf, -Inf, NA, 33))
is.infinite(my_data)
#> column1 column2
#> 1 TRUE FALSE
#> 2 FALSE TRUE
#> 3 FALSE TRUE
#> 4 FALSE FALSE
#> 5 FALSE FALSE
is.finite(my_data)
#> column1 column2
#> 1 FALSE TRUE
#> 2 TRUE FALSE
#> 3 TRUE FALSE
#> 4 TRUE FALSE
#> 5 TRUE TRUE
is.nan(my_data)
#> column1 column2
#> 1 FALSE FALSE
#> 2 FALSE FALSE
#> 3 FALSE FALSE
#> 4 FALSE FALSE
#> 5 FALSE FALSE
# The question code line throwing the error
my_data[rowSums(is.infinite(my_data)) > 0, ]
#> column1 column2
#> 1 Inf 1
#> 2 5 Inf
#> 3 3 -Inf
Created on 2022-08-05 by the reprex package (v2.0.1)
CodePudding user response:
Something like this should work
library(tidyverse)
my_data <- data.frame(column1 = c(Inf, 5, 3,4,5),
column2 = c(1, Inf, -Inf, NA, 33))
my_data
# column1 column2
# 1 Inf 1
# 2 5 Inf
# 3 3 -Inf
# 4 4 NA
# 5 5 33
my_data %>%
filter(is.infinite(column1) | is.infinite(column2))
# column1 column2
# 1 Inf 1
# 2 5 Inf
# 3 3 -Inf
If you have too many columns to individually name, you could use if_any()
(credit to @RuiBarradas), like so:
# Across all columns
my_data %>%
filter(if_any(everything(), is.infinite))
# Across a range of columns
my_data %>%
filter(if_any(column1:column2, is.infinite))