I have a data frame like below, a1 a2 a3 1 A x 10 2 AA x 20 3 P w 13 4 R y 45 5 BC m 46 6 AC y 36 7 AD y 19 8 S y 19 9 RK m 30
I want to create a new dataframe from this where, for each distinct value of column a2, if the values of a1 are different then it would create a mean from those rows using the values of the column a3. For example, for a2=x, I want the average of 10 20/2=15 (row 1 and 2 using the values of column 3). My original dataset is much larger than this. Can anyone tell me how to resolve this in R?
CodePudding user response:
Perhaps this helps
library(dplyr)
df1 %>%
group_by(a2) %>%
mutate(Mean = mean(a3[!duplicated(a1)], na.rm = TRUE)) %>%
ungroup
CodePudding user response:
Here is a similar solution using an ifelse
statement:
library(dplyr)
df %>%
group_by(a2) %>%
mutate(Mean = ifelse(!duplicated(a1), mean(a3, na.rm= TRUE), a3))
a1 a2 a3 Mean
<chr> <chr> <int> <dbl>
1 A x 10 15
2 AA x 20 15
3 P w 13 13
4 R y 45 29.8
5 BC m 46 38
6 AC y 36 29.8
7 AD y 19 29.8
8 S y 19 29.8
9 RK m 30 38
structure(list(a1 = c("A", "AA", "P", "R", "BC", "AC", "AD",
"S", "RK"), a2 = c("x", "x", "w", "y", "m", "y", "y", "y", "m"
), a3 = c(10L, 20L, 13L, 45L, 46L, 36L, 19L, 19L, 30L)), class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9"))