Home > Enterprise >  Filtering R Rows based on data type
Filtering R Rows based on data type

Time:02-02

I have a dataframe A with the following columns

SN     Sample1        Sample2

Sample 1 and 2 either have numeric values or some text to denote that no sampling was possible.

I need to keep any row that has at least one numeric value.

My idea was to filter out rows based on having no numeric values.

I normally use this: A[!is.na(as.numeric(A$sample1)), ] but this only looks at one of the columns.

I need help to write this out where it looks at Sample1 and Sample2.

Basically, what I need done is

Sample 1 text    Sample 2 text    #then remove
Sample 1 numeric Sample 2 numeric #then keep
Sample 1 numeric Sample 2 text    #then keep
Sample 1 text    Sample 2 numeric #then keep

CodePudding user response:

In base R, you can use grepl to search for digits then create two logicals and index with the "or" operator, |:

df[grepl("^-?\\d \\.?\\d*$", df$a) | grepl("^-?\\d \\.?\\d*$", df$b), ] #thanks @zephyryl

#   a b
# 1 1 A
# 2 2 B

Original Sample Data:

df <- data.frame(a = c(1, 2, "Abc"),
                 b = c(LETTERS[1:3]))
#     a b
# 1   1 A
# 2   2 B
# 3 Abc C

CodePudding user response:

You could make your existing code into a function (also suppressing the "NAs introduced by coercion warning" and handling NAs in the original vector), then apply rowwise using apply():

is_coercible_numeric <- function(x) {
  is.na(x) | !is.na(suppressWarnings(as.numeric(x)))
}

A[apply(dat, 1, \(col) any(is_coercible_numeric(col))), ]
#   Sample1 Sample2
# 2       2       2
# 3       3   text3
# 4   text4       4

Or using dplyr:

library(dplyr)

A %>% 
  filter(if_any(Sample1:Sample2, is_coercible_numeric))
#   Sample1 Sample2
# 1       2       2
# 2       3   text3
# 3   text4       4

Example data:

A <- data.frame(
  Sample1 = c("text1", 2, 3, "text4"),
  Sample2 = c("text1", 2, "text3", 4)
)
  • Related