I'm beggining to introduce the tidyverse into my coding skills and I'm running into some trouble when trying to use a custom function in a pipe.
I have a dataset of patient data in two different timepoints. Example data:
dataset <- data.frame(patient_id = rep(1:5, each=6),
timepoint = rep(1:2, 15),
Mean = c(sample(100:130, 25),25,315,46,223,67),
Circ. = sample(40:99, 30)/100,
Perim. = sample(1000:2500, 30))
I want to group my data by patient_id
and timepoint
and then apply to each group a funtion that removes the rows with an outlier value in the Mean
column. This is what I wrote:
dataset <- dataset %>%
group_by(patient_id, timepoint) %>%
group_modify(~rm.outliers(.x,"Mean")) %>%
ungroup()
The error I get when running this line is:
Error: Can't subset columns that don't exist. x Locations 41, 119, 124, 112, 130, etc. don't exist. ℹ There are only 1 column.
It makes me think is has something to do with keeping the grouping after removing the outliers, but I don't know how to approach it.
The rm.outliers
is a custom function that removes any line with a mean value more than 1.5 interquartile ranges (IQRs) below the first quartile or above the third quartile. It works well for a single dataframe but I'm not very used to writting funcions so there may be some mistakes here:
rm.outliers <- function(data, column){
Q <- quantile(data[,c(column)], probs=c(.25, .75), na.rm = FALSE)
iqr <- IQR(data[,c(column)])
up <- Q[2] 1.5*iqr # Upper Range
low<- Q[1]-1.5*iqr # Lower Range
data <- data[data[,c(column)] < up & data[,c(column)] > low, ]
data
}
What am I doing wrong? Is there a better way of doing this using tidyverse?
Thanks for any help you can offer
CodePudding user response:
I would suggest to return logical values from rm.outliers
function and use it in filter
.
library(dplyr)
rm.outliers <- function(data){
Q <- quantile(data, probs=c(.25, .75), na.rm = FALSE)
iqr <- IQR(data)
up <- Q[2] 1.5*iqr # Upper Range
low<- Q[1]-1.5*iqr # Lower Range
data < up & data > low
}
dataset %>%
group_by(patient_id, timepoint) %>%
filter(rm.outliers(Mean)) %>%
ungroup()