Filter using Dataset Position in R-CodePudding

I'm not really familiar with dplyr function in R. However, I want to filter my dataset into certain conditions.

Let's say I've more than 100 of attributes in my dataset. And I want to perform filter with multiple condition.

Can I put my coding filter the position of the column instead of their name as follow:

y = filter(retag, c(4:50) != 8 & c(90:110) == 8)

I've tried few times similar with this coding, however still haven't get the result.

I also did tried coding as follow, but not sure how to add another conditions into the rowSums function.

retag[rowSums((retag!=8)[,c(4:50)])>=1,]

The only example that I found was using the dataset names instead of the position.

Or is there any way to filter using the dataset position as my data quite huge.

CodePudding user response：

You can use a combination of filter() and across(). I didn't have your version of the retag dataframe so I created my own as an example

set.seed(2000)

retag <- tibble(
  col1 = runif(n = 1000, min = 0, max = 10) %>% round(0),
  col2 = runif(n = 1000, min = 0, max = 10) %>% round(0),
  col3 = runif(n = 1000, min = 0, max = 10) %>% round(0),
  col4 = runif(n = 1000, min = 0, max = 10) %>% round(0),
  col5 = runif(n = 1000, min = 0, max = 10) %>% round(0)
)

# filter where the first, second, and third column all equal 5 and the fourth column does not equal 5
retag %>%
  filter(
    across(1:3, function(x) x == 5), 
    across(4, function(x) x != 5)
  )

CodePudding user response：

if_all() and if_any() were recently introduced into the tidyverse for the purpose of filtering across multiple variables.

library(dplyr)

filter(retag, if_all(X:Y, ~ .x > 10 & .x < 35))

# # A tibble: 5 x 2
#       X     Y
#   <int> <int>
# 1    11    30
# 2    12    31
# 3    13    32
# 4    14    33
# 5    15    34

filter(retag, if_any(X:Y, ~ .x == 2 | .x == 25))

# # A tibble: 2 x 2
#       X     Y
#   <int> <int>
# 1     2    21
# 2     6    25

Data

retag <- structure(list(X = 1:20, Y = 20:39), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))