I have a big dataframe similar to this one:
df <- data.frame(sample=c('s1a', 's1b', 's2a', 's2b', 's3a', 's3b'), Mg=1:6, P=7:12, K=3:8)
where "a" and "b" are repeated measurements of the same samples. I would like to obtain a new df with the mean for each measurements per sample (s1, s2, s3) and obtain something like this:
df_new <- data.frame(sample=c('s1', 's2', etc..), Mg=1.5, etc.., P=7.5, etc.., K=3.5, etc)
CodePudding user response:
You can use aggregate
and use sub
to remove a
and b
.
aggregate(. ~ sample, transform(df, sample = sub("[ab]$", "", sample)), mean)
#aggregate(. ~ sample, within(df, sample <- sub("[ab]$", "", sample)), mean) #Alternative
#aggregate(df[-1], list(sample=sub("[ab]$", "", df[,1])), mean) #Alternative
# sample Mg P K
#1 s1 1.5 7.5 3.5
#2 s2 3.5 9.5 5.5
#3 s3 5.5 11.5 7.5
CodePudding user response:
library(tidyverse)
df %>%
group_by(sample = str_extract(sample, ".{0,2}")) %>%
summarise(across(everything(), mean))
# A tibble: 3 × 4
sample Mg P K
<chr> <dbl> <dbl> <dbl>
1 s1 1.5 7.5 3.5
2 s2 3.5 9.5 5.5
3 s3 5.5 11.5 7.5