Finding mean for specific rows in a data frame with certain values-CodePudding

I have a data frame like below, a1 a2 a3 1 A x 10 2 AA x 20 3 P w 13 4 R y 45 5 BC m 46 6 AC y 36 7 AD y 19 8 S y 19 9 RK m 30

I want to create a new dataframe from this where, for each distinct value of column a2, if the values of a1 are different then it would create a mean from those rows using the values of the column a3. For example, for a2=x, I want the average of 10 20/2=15 (row 1 and 2 using the values of column 3). My original dataset is much larger than this. Can anyone tell me how to resolve this in R?

CodePudding user response：

Perhaps this helps

library(dplyr)
df1 %>%
   group_by(a2) %>%
   mutate(Mean = mean(a3[!duplicated(a1)], na.rm = TRUE)) %>%
  ungroup

CodePudding user response：

Here is a similar solution using an ifelse statement:

library(dplyr)

df %>% 
  group_by(a2) %>% 
  mutate(Mean = ifelse(!duplicated(a1), mean(a3, na.rm= TRUE), a3))

a1    a2       a3  Mean
  <chr> <chr> <int> <dbl>
1 A     x        10  15  
2 AA    x        20  15  
3 P     w        13  13  
4 R     y        45  29.8
5 BC    m        46  38  
6 AC    y        36  29.8
7 AD    y        19  29.8
8 S     y        19  29.8
9 RK    m        30  38

structure(list(a1 = c("A", "AA", "P", "R", "BC", "AC", "AD", 
"S", "RK"), a2 = c("x", "x", "w", "y", "m", "y", "y", "y", "m"
), a3 = c(10L, 20L, 13L, 45L, 46L, 36L, 19L, 19L, 30L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9"))