Home > Enterprise >  Conditional mean from different column
Conditional mean from different column

Time:05-19

I do have an R data frame like this:

city2001 <- c('a', 'b', 'a')
grade2001 <- c(5, 5, 7)
city2002 <- c('b', 'b', 'a')
grade2002 <- c(8, 9, 10)

df <- data.frame(city2001, grade2001, city2002, grade2002)

and would like to return ,

avg_a = 7.333
# from (5   7   10)/3

How is the logic for that? Thanks.

CodePudding user response:

Try

mean(df[,grepl("grade",colnames(df))][df[,grepl("city",colnames(df))]=="a"])
[1] 7.333333

your df (columns) better be sorted.

If you want for all the groups and not just "a"

tapply(
  unlist(df[,grepl("grade",colnames(df))]),
  unlist(df[,grepl("city",colnames(df))]),
  mean
)
       a        b 
7.333333 7.333333

CodePudding user response:

library(tidyverse)
df %>%
    pivot_longer(everything(), names_to = c('.value', 'year'),
                  names_pattern = '(\\D )(\\d )') %>%
    group_by(city)%>%
    summarise(mean=mean(grade))

# A tibble: 2 x 2
  city   mean
  <chr> <dbl>
1 a      7.33
2 b      7.33

CodePudding user response:

Here is a one-liner via base R,

aggregate(v2 ~ v1, 
          cbind.data.frame(v1 = stack(df[c(TRUE, FALSE)])$values, 
                           v2 = stack(df[c(FALSE, TRUE)])$values), 
          mean)

#  v1       v2
#1  a 7.333333
#2  b 7.333333

CodePudding user response:

Another possible solution:

library(tidyverse)

map(list(df[1:2], df[3:4]), ~ filter(.x, .x[1] == "a") %>% pull(2)) %>% 
  unlist %>% mean

#> [1] 7.333333

Or getting means for each city:

library(tidyverse)

map_dfr(list(df[1:2], df[3:4]), ~ `colnames<-`(.x, c("city", "grade"))) %>%
  group_by(city) %>% 
  summarise(means = mean(grade))

#> # A tibble: 2 x 2
#>   city  means
#>   <chr> <dbl>
#> 1 a      7.33
#> 2 b      7.33
  •  Tags:  
  • r
  • Related