Home > OS >  Apply function on dataframe by specific group in R
Apply function on dataframe by specific group in R

Time:01-31

I have a dataframe that looks something like this:

dist   id daytime  season 
3  1.11     Name1     day    summer   
4  2.22     Name2     night  spring   
5  3.33     Name1     day    winter   
6  4.44     Name3     night  fall  

I want of summary of distby some specific collums in my dataframe.

So far I used a custom function:

summary <- function(x){df %>%                               
    group_by(x) %>% 
    summarize(min = min(dist),
              q1 = quantile(dist, 0.25),
              median = median(dist),
              mean = mean(dist),
              q3 = quantile(dist, 0.75),
              max = max(dist))}

And applied it to any specific collumn I wanted at the moment:

summary_ID <- path.summary(id)

I tried it a few weeks ago and would get something like this>

  id       min    q1 median  mean    q3   max
   <chr>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
 1 Name1   0     17.8   310.   788. 1023. 5832.
 2 Name2   0     31.7   284.   570.  744. 9578.
 3 Name3   0     17.0   325.   721. 1185. 5293.
 4 Name4   0     11.9   197.   530.  865. 3476.
 5 Name5   0     24.5    94.9  617.  966. 9567.

When I try it now I get an error:

Error in `group_by()`:
! Must group by variables found in `.data`.
✖ Column `x` is not found.

What changed and how do I get around the issue?

CodePudding user response:

Here, we may use {{}} if the input is unquoted

path_summary <- function(dat, x){
  dat %>%                               
    group_by({{x}}) %>% 
    summarize(min = min(dist),
              q1 = quantile(dist, 0.25),
              median = median(dist),
              mean = mean(dist),
              q3 = quantile(dist, 0.75),
              max = max(dist))
}

-testing

> path_summary(df, id)
# A tibble: 3 × 7
  id      min    q1 median  mean    q3   max
  <chr> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1 Name1  1.11  1.66   2.22  2.22  2.78  3.33
2 Name2  2.22  2.22   2.22  2.22  2.22  2.22
3 Name3  4.44  4.44   4.44  4.44  4.44  4.44

data

df <- structure(list(dist = c(1.11, 2.22, 3.33, 4.44), id = c("Name1", 
"Name2", "Name1", "Name3"), daytime = c("day", "night", "day", 
"night"), season = c("summer", "spring", "winter", "fall")), 
class = "data.frame", row.names = c("3", 
"4", "5", "6"))
  • Related