Average of Values of Different Percentage Lengths-CodePudding

I have a data frame that looks something like this:

ID	Time	Value
A	0	84
A	1	90
A	2	76
A	3	98
B	0	64
B	1	81
C	0	89
C	1	76

I need to take the mean of the first 10% of values for each ID.

I used to do a similar process with the slice_head function, but previously I had taken the same length for each variable and used aggregate (grouped by ID) with the new data frame. Now that the lengths of each ID are different, slice keeps giving an error.

I have attempted map2 and lapply, but I cannot quite get it to work.

CodePudding user response：

With map you can do something like this

library(tidyverse)

iris %>% 
  group_split(Species) %>% 
  map_dfr(~ .x %>% slice_head(n = nrow(.x)*.1) %>%
            group_by(Species) %>% 
            summarise(mean_l = mean(Sepal.Length)))
#> # A tibble: 3 × 2
#>   Species    mean_l
#>   <fct>       <dbl>
#> 1 setosa       4.86
#> 2 versicolor   6.46
#> 3 virginica    6.4

^{Created on 2022-07-08 by the reprex package (v2.0.1)}

CodePudding user response：

In the original data, I am hoping that the number of data points for each ID will be far more

jnk<-data.frame(ID=c(rep("A",4),rep("B",2),rep("C",2)),Time=c(0,1,2,3,0,1,0,1),Value=c(84,90,76,98,64,81,89,76))

Mean of 1st 10% of each ID:

> mean(jnk$Value[which(jnk$ID=="A")[1:(length(jnk$Value[which(jnk$ID=="A")])*0.1)]])
> mean(jnk$Value[which(jnk$ID=="B")[1:(length(jnk$Value[which(jnk$ID=="B")])*0.1)]])
> mean(jnk$Value[which(jnk$ID=="C")[1:(length(jnk$Value[which(jnk$ID=="C")])*0.1)]])

You will get mean values of 1st 10% with sufficiently high number of data points corresponding to each ID

CodePudding user response：

I am not sure if this is what you want, but you could use slice with ceiling to get the first 10% percent per group and then summarise the mean of your value column like this:

df <- data.frame(ID = c("A", "A", "A", "A", "B", "B", "C", "C"),
                 Time = c(0, 1, 2, 3, 0, 1, 0, 1),
                 Value = c(84, 90, 76, 98, 64, 81, 89, 76))

library(dplyr)
df %>%
  group_by(ID) %>%
  slice(1:ceiling(0.1 * n())) %>%
  summarise(avg_value = mean(Value))
#> # A tibble: 3 × 2
#>   ID    avg_value
#>   <chr>     <dbl>
#> 1 A            84
#> 2 B            64
#> 3 C            89

^{Created on 2022-07-08 by the reprex package (v2.0.1)}

Please note: results are weird, because of less data.