The goal: I want to create 2 new columns by using R.
1 column which shows the mean of each row (but only calculating specific columns - only the mean of the columns which do not contain the string "_X")
1 column which shows the mean of each row (but only calculating specific columns - only the mean of the columns which do contain the string "_X").
For example:
phone1 phone1_X phon2 phone2_X phone3 phone3_X
1 2 3 4 5 6
2 4 6 8 10 12
Results:
Mean_of_none_X
3 (1 3 5)/3
6 (2 5 10)3
Mean_of_X
4
8
Thank you!
CodePudding user response:
Try using rowMeans
and grep
over the column names to include/exclude certain columns:
# only "_x"
rowMeans(df[,grep("_x",colnames(df))])
# No "_x"
rowMeans(df[,-grep("_x",colnames(df))])
Output:
#> # only "_x"
#> rowMeans(df[,grep("_x",colnames(df))])
#[1] 4 8
#> # No "_x"
#> rowMeans(df[,-grep("_x",colnames(df))])
#[1] 3 6
CodePudding user response:
Try this
> lapply(split.default(df, endsWith(names(df), "_X")), rowMeans)
$`FALSE`
[1] 3 6
$`TRUE`
[1] 4 8
CodePudding user response:
library(dplyr)
df %>%
rowwise() %>%
mutate(x_mean = mean(c_across(contains('_X'))),
notx_mean = mean(c_across(!contains('_X') & !contains('_mean'))))