Home > Enterprise >  Is there a R function to filter a data frame by a determined range of values?
Is there a R function to filter a data frame by a determined range of values?

Time:11-23

This question has been likely answered already anywhere, but I am new in R and after a couple of hours searching for a solution I have not found anything, so I am sorry in advance for a probably redundant question.

I have a dataframe consisting of 3 columns and more than 10.000 rows. Columns are time, speed, and acceleration. What I want to do is select only those rows in which the speed is equal to 3, or 3.2 or 3.4 or 3.6 ... until the maximum value of speed present in this specific data frame (imagine the max value was 10.3. Therefore, the last value I would like to filter by would be 10.2). However, this maximum value will be different in future data frames like this (this is data from football players with specific maximum speeds).

My data frame is called "ASdata" and I want to create another data frame called "ASdata_only" with only rows in which speed is == 3, 3.2, 3.4... until maximum value (increasing by 0.2).

This is what I tried:

ASdata_only<-ASdata %>% filter(speed==3 | speed==3.2 | speed==3.4)

But it gives two problems:

(1) It is very time consuming because in this way I would have to include every speed and surely there is a simpler way to do it.

(2) For every dataframe, I would have to previously find the maximum value to know when to stop.

I do not even know if the question is well structured: I am sorry also for this. Thank you very much in advance for the help!

CodePudding user response:

Using dplyr

library(dplyr)

a = 3 # smallest amount
b = max(ASdata$speed) #biggest amount of speed in the dataframe
d = 0.2 # incremental amount

ASdata_only<-ASdata %>% filter(speed %in% round(seq(a,b,d),1))

CodePudding user response:

You can make a 'list' with seq, which takes the minimum value, maximum and the step size. Then you can use filter() with %in% to keep only the values in the sequence:

my_list <- round(seq(3, 10.4, 0.2), 1)

ASdata %>% filter(speed %in% my_list)

to solve the issue in the comment, it's simplest to write a function:

only_in_range <- function(data, speed_var = speed){
  min <- summarise(data, min = min({{ speed_var }})) %>% pull(min)
  max <- summarise(data, max = max({{ speed_var }})) %>% pull(max)
  range <- round(seq(min, max, 0.2), 1)
  
  data %>%
    filter({{ speed_var }} %in% range)
}

The function finds the min and max values of the speed variable which has a default set to speed - but you can type the name if it's different in other frames.

Creates a sequence rounded to 1 decimal place (named range), to try to avoid floating point issues, then filters your data frame for only the values in the range.

  • Related