Home > Blockchain >  Filter using Dataset Position in R
Filter using Dataset Position in R

Time:09-22

I'm not really familiar with dplyr function in R. However, I want to filter my dataset into certain conditions.

Let's say I've more than 100 of attributes in my dataset. And I want to perform filter with multiple condition.

Can I put my coding filter the position of the column instead of their name as follow:

y = filter(retag, c(4:50) != 8 & c(90:110) == 8)

I've tried few times similar with this coding, however still haven't get the result.

I also did tried coding as follow, but not sure how to add another conditions into the rowSums function.

retag[rowSums((retag!=8)[,c(4:50)])>=1,]

The only example that I found was using the dataset names instead of the position.

Or is there any way to filter using the dataset position as my data quite huge.

CodePudding user response:

You can use a combination of filter() and across(). I didn't have your version of the retag dataframe so I created my own as an example

set.seed(2000)

retag <- tibble(
  col1 = runif(n = 1000, min = 0, max = 10) %>% round(0),
  col2 = runif(n = 1000, min = 0, max = 10) %>% round(0),
  col3 = runif(n = 1000, min = 0, max = 10) %>% round(0),
  col4 = runif(n = 1000, min = 0, max = 10) %>% round(0),
  col5 = runif(n = 1000, min = 0, max = 10) %>% round(0)
)

# filter where the first, second, and third column all equal 5 and the fourth column does not equal 5
retag %>%
  filter(
    across(1:3, function(x) x == 5), 
    across(4, function(x) x != 5)
  )

CodePudding user response:

if_all() and if_any() were recently introduced into the tidyverse for the purpose of filtering across multiple variables.

library(dplyr)

filter(retag, if_all(X:Y, ~ .x > 10 & .x < 35))

# # A tibble: 5 x 2
#       X     Y
#   <int> <int>
# 1    11    30
# 2    12    31
# 3    13    32
# 4    14    33
# 5    15    34

filter(retag, if_any(X:Y, ~ .x == 2 | .x == 25))

# # A tibble: 2 x 2
#       X     Y
#   <int> <int>
# 1     2    21
# 2     6    25

Data

retag <- structure(list(X = 1:20, Y = 20:39), row.names = c(NA, -20L), class = c("tbl_df", 
"tbl", "data.frame"))
  • Related