I have a data frame that looks something like this:
ID | Time | Value |
---|---|---|
A | 0 | 84 |
A | 1 | 90 |
A | 2 | 76 |
A | 3 | 98 |
B | 0 | 64 |
B | 1 | 81 |
C | 0 | 89 |
C | 1 | 76 |
I need to take the mean of the first 10% of values for each ID.
I used to do a similar process with the slice_head function, but previously I had taken the same length for each variable and used aggregate (grouped by ID) with the new data frame. Now that the lengths of each ID are different, slice keeps giving an error.
I have attempted map2 and lapply, but I cannot quite get it to work.
CodePudding user response:
With map you can do something like this
library(tidyverse)
iris %>%
group_split(Species) %>%
map_dfr(~ .x %>% slice_head(n = nrow(.x)*.1) %>%
group_by(Species) %>%
summarise(mean_l = mean(Sepal.Length)))
#> # A tibble: 3 × 2
#> Species mean_l
#> <fct> <dbl>
#> 1 setosa 4.86
#> 2 versicolor 6.46
#> 3 virginica 6.4
Created on 2022-07-08 by the reprex package (v2.0.1)
CodePudding user response:
In the original data, I am hoping that the number of data points for each ID will be far more
jnk<-data.frame(ID=c(rep("A",4),rep("B",2),rep("C",2)),Time=c(0,1,2,3,0,1,0,1),Value=c(84,90,76,98,64,81,89,76))
Mean of 1st 10% of each ID:
> mean(jnk$Value[which(jnk$ID=="A")[1:(length(jnk$Value[which(jnk$ID=="A")])*0.1)]])
> mean(jnk$Value[which(jnk$ID=="B")[1:(length(jnk$Value[which(jnk$ID=="B")])*0.1)]])
> mean(jnk$Value[which(jnk$ID=="C")[1:(length(jnk$Value[which(jnk$ID=="C")])*0.1)]])
You will get mean values of 1st 10% with sufficiently high number of data points corresponding to each ID
CodePudding user response:
I am not sure if this is what you want, but you could use slice
with ceiling
to get the first 10% percent per group and then summarise
the mean
of your value column like this:
df <- data.frame(ID = c("A", "A", "A", "A", "B", "B", "C", "C"),
Time = c(0, 1, 2, 3, 0, 1, 0, 1),
Value = c(84, 90, 76, 98, 64, 81, 89, 76))
library(dplyr)
df %>%
group_by(ID) %>%
slice(1:ceiling(0.1 * n())) %>%
summarise(avg_value = mean(Value))
#> # A tibble: 3 × 2
#> ID avg_value
#> <chr> <dbl>
#> 1 A 84
#> 2 B 64
#> 3 C 89
Created on 2022-07-08 by the reprex package (v2.0.1)
Please note: results are weird, because of less data.