Home > Software engineering >  Finding mean for specific rows in a data frame with certain values
Finding mean for specific rows in a data frame with certain values

Time:08-31

I have a data frame like below, a1 a2 a3 1 A x 10 2 AA x 20 3 P w 13 4 R y 45 5 BC m 46 6 AC y 36 7 AD y 19 8 S y 19 9 RK m 30

I want to create a new dataframe from this where, for each distinct value of column a2, if the values of a1 are different then it would create a mean from those rows using the values of the column a3. For example, for a2=x, I want the average of 10 20/2=15 (row 1 and 2 using the values of column 3). My original dataset is much larger than this. Can anyone tell me how to resolve this in R?

CodePudding user response:

Perhaps this helps

library(dplyr)
df1 %>%
   group_by(a2) %>%
   mutate(Mean = mean(a3[!duplicated(a1)], na.rm = TRUE)) %>%
  ungroup

CodePudding user response:

Here is a similar solution using an ifelse statement:

library(dplyr)

df %>% 
  group_by(a2) %>% 
  mutate(Mean = ifelse(!duplicated(a1), mean(a3, na.rm= TRUE), a3))
a1    a2       a3  Mean
  <chr> <chr> <int> <dbl>
1 A     x        10  15  
2 AA    x        20  15  
3 P     w        13  13  
4 R     y        45  29.8
5 BC    m        46  38  
6 AC    y        36  29.8
7 AD    y        19  29.8
8 S     y        19  29.8
9 RK    m        30  38 
structure(list(a1 = c("A", "AA", "P", "R", "BC", "AC", "AD", 
"S", "RK"), a2 = c("x", "x", "w", "y", "m", "y", "y", "y", "m"
), a3 = c(10L, 20L, 13L, 45L, 46L, 36L, 19L, 19L, 30L)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9"))
  •  Tags:  
  • r
  • Related