I was wondering if there is an efficient way to take the average of each set of similarly named columns whose names end in ...1S
and ...2S
(ex: ex1S
,ex2S
) at time==1
and take the average of each set of similarly named columns whose names end in ...1C
or ...2C
(ex: ex1C
,ex2C
) at time==2
to achieve my Desired_output
below?
Note: In my actual Data
, I have several columns ending in ...1S
and ...1C
to average across, so, a functional answer is much appreciated.
My current inefficient solution is:
library(tidyverse)
Data %>%
mutate(ave_ex = case_when(
time == 1 ~ mean(c(ex1S, ex2S)),
time == 2 ~ mean(c(ex1C, ex2C))
),
ave_id = case_when(
time == 1 ~ mean(c(id1S, id2S)),
time == 2 ~ mean(c(id1C, id2C))
)) %>% select(-c(ex1S:id2C))
Data = read.table(text = "
order DV score time ex1S ex2S ex1C ex2C id1S id2S id1C id2C k t
s-c ac 1 1 8 5 6 1 2 4 3 7 400 30
s-c bc 2 1 8 5 6 1 2 4 3 7 400 30
s-c ac 3 2 8 5 6 1 2 4 3 7 600 50
s-c bc 4 2 8 5 6 1 2 4 3 7 600 50
", header = TRUE)
Desired_output = "
order time DV score k t ave_ex ave_id
s-c 1 ac 1 400 30 (8 5)/2 =6.5 (2 4)/2 =3
s-c 1 bc 2 400 30 (8 5)/2 =6.5 (2 4)/2 =3
s-c 2 ac 3 600 50 (6 1)/2 =3.5 (3 7)/2 =5
s-c 2 bc 4 600 50 (6 1)/2 =3.5 (3 7)/2 =5
"
where ave_
= average i.e., mean()
.
CodePudding user response:
If we need to automate, we may need a matching with the time
and the C
, S
category. Below, we select only the 'ex', 'id' columns, then split
, the data into a list
of datasets having matching substring of column names i.e. without the digits (str_remove
), loop over the list
with map
, get the mean
, enframe
to a two column tibble, create time
column based on the substring in name, reshape to wide with pivot_wider
and use the time
column to join with original Data and select the columns of interest
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
Data %>%
dplyr::select(starts_with("ex"), starts_with("id")) %>%
split.default(str_remove(names(.), "\\d ")) %>%
map(~ mean(rowMeans(.x))) %>%
enframe %>%
unnest(value) %>%
mutate(time = c(1, 2)[str_detect(name, "C") 1],
name = str_c("ave_", str_remove(name, "[A-Z] "))) %>%
pivot_wider(names_from = name, values_from = value) %>%
left_join(Data, .) %>%
dplyr::select(-starts_with("ex"), -starts_with("id"))
-output
order DV score time k t ave_ex ave_id
1 s-c ac 1 1 400 30 6.5 3
2 s-c bc 2 1 400 30 6.5 3
3 s-c ac 3 2 600 50 3.5 5
4 s-c bc 4 2 600 50 3.5 5