Home > front end >  Error calculating IQR for multiple columns in a for loop
Error calculating IQR for multiple columns in a for loop

Time:12-14

I want to calculate the IQR and length of all my numeric columns to use in the Freedman-Diaconis equation for calculating histogram binwidth and then use this in a ggplot.

I can do this, as follows, with iris:

datai = iris %>%
  filter(Species == "virginica")%>%
  select(-Species)
  
for (i in colnames(datai)) {

bw = (2* IQR(datai[,i], na.rm = T)/ length(datai[,i])^(1/3))
 
plot=  ggplot(datai, (x= .data[[i]])) 
    geom_histogram(binwidth = bw)
  
 print(plot)
}

but with my own dataset I get an error which arises from IQR

#MWE
datah = structure(list(DBP = c(74.667, 78.6666666666667, 82, 73, 78.6666666666667, 
                                68.6667), SBP = c(134, 114.666666666667, 126, 161, 126, 141.333
                                )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
                                ))
 
for (i in colnames(datah)) {
  
bw = (2* IQR(datah[,i], na.rm = T) )/ length(datah[,i])^(1/3)

ggp3 <- ggplot(datah, aes(x = .data[[i]] ))      
  geom_histogram( binwidth = bw) 
print(ggp3)
}

The error is:

Error in quantile(as.numeric(x), c(0.25, 0.75), na.rm = na.rm, names = FALSE, : 
'list' object cannot be coerced to type 'double'

CodePudding user response:

The dataset is tibble whereas iris is data.frame, thus , for extraction into a vector works for iris where as with the new data it will still be a tibble with single column. Use [[ instead. According to ?IQR, the input x should be a numeric vector.

for (i in colnames(datah)) {
  
bw <- (2* IQR(datah[[i]], na.rm = TRUE) )/ length(datah[[i]])^(1/3)
ggp3 <- ggplot(datah, aes(x = .data[[i]] ))      
  geom_histogram( binwidth = bw) 
print(ggp3)
}

-output (last column)

enter image description here

  • Related