Home > Blockchain >  Get mean column values every n rows grouped by value in other column
Get mean column values every n rows grouped by value in other column

Time:06-10

I have a dataframe df that looks like this

  time object
1     1      A
2     2      A
3     3      A
4     4      A
5     5      A
6     6      A
7     7      B
8     8      B
9     9      B
10   10      B
11   11      B
12   12      C
13   13      C
14   14      C
15   15      C
16   16      C
17   17      C
18   18      C

I would like to get the mean of the timecolumn every 3 rows based on the object column

df_mean
  time object
1    2      A
2    5      A
3    8      B
4   13      C
5   16      C

I though about using dplyr

df%>% 
  mutate(grp = 1  (row_number()-1) %/% 3) %>% 
  group_by(grp) %>% 
  summarise(across(c("time"), mean, na.rm = TRUE)) %>% 
  select(-grp)

but I do not know how to integrate the control for the object.

Another option would be to use aggregate

aggregate(.~object, data=df, mean)

but in this case I do not know how to get the mean every 3 rows.

CodePudding user response:

Your dplyr attempt is on the right track. With a few modifications it will work.

library(dplyr)

df <- tibble(time = 1:18, object = rep(c('A', 'B', 'C'), each = 6))

df %>%
  group_by(object, grp = (row_number()-1) %/% 3) %>%
  summarise(across(time, mean, na.rm = T), .groups = 'drop') %>%
  select(-grp)

#> # A tibble: 6 × 2
#>   object  time
#>   <chr>  <dbl>
#> 1 A          2
#> 2 A          5
#> 3 B          8
#> 4 B         11
#> 5 C         14
#> 6 C         17

CodePudding user response:

Here is an alternative dplyr way:

library(dplyr)

n = 3

df %>% 
  group_by(object, Col2 = rep(row_number(), each=n, length.out = n())) %>% 
  summarise(time = mean(time, na.rm = TRUE)) %>% 
  select(-Col2)
  object  time
  <chr>  <dbl>
1 A          2
2 A          5
3 B          8
4 B         11
5 C         14
6 C         17

CodePudding user response:

You can do a slight modification, creating your grp variable within object group first, and then filtering where the size of the joint grouping is >=3, and then summarize:

df%>% 
  group_by(object) %>% 
  mutate(grp = 1  (row_number()-1) %/% 3) %>% 
  group_by(object,grp) %>% 
  filter(n()>=3) %>% 
  summarize(time=mean(time), .groups="drop") %>% 
  select(-grp)

Output:

  object  time
  <chr>  <dbl>
1 A          2
2 A          5
3 B          8
4 C         13
5 C         16

CodePudding user response:

data.table option:

library(data.table)
setDT(df)
df[, n3:=gl(.N, 3, length=.N), by=object]
df[, .(time=mean(time)), by=.(object, n3)][, !"n3"]

Output:

   object time
1:      A    2
2:      A    5
3:      B    8
4:      B   11
5:      C   14
6:      C   17

CodePudding user response:

in base R:

aggregate(time~., cbind(df, gr=gl(nrow(df),3, nrow(df))), mean)

  object gr time
1      A  1    2
2      A  2    5
3      B  3    8
4      B  4   11
5      C  5   14
6      C  6   17
  •  Tags:  
  • r
  • Related