I have a dataset containing a number of numeric variables whose names all start with "Ranking". For each of these variables, I want to add another variable to the dataset that contains the column mean of the first variable.
So the data look something like this:
| Ranking_blah | Ranking_bleh |
| -------- | ---------- |
| 1 | 0 |
| 0 | 1 |
| NA | 0.5 |
and what I want is:
| Ranking_blah | Ranking_bleh | Ranking_blah_mean | Ranking_bleh_mean |
| -------- | ---------- |---------------- |----------------|
| 1 | 0 | 0 | 0.5 |
| -1 | 1 | 0 | 0.5 |
| NA | 0.5 | 0 | 0.5
(I am aware this way the mean variables have the same values in all rows, respectively - I need this because the data will be reshaped later)
What I've tried so far:
#getting a list of all ranking variables I want to create a new mean variable from
ranking_variables = names(data)[grepl("Ranking", names(data))]
#creating a new variable for each base variable in the list and setting it to the mean of the respective base variable
data[paste0(ranking_variables, "_mean")] <- do.call(cbind, lapply(data[ranking_variables], function(x) mean(x, na.rm = TRUE)))
The second part is not working, though, it only yields NA values. What am I doing wrong?
CodePudding user response:
An alternative approach is to use dplyr
's across
:
dat |>
mutate(across(starts_with("Ranking"), ~ mean(., na.rm = TRUE), .names = "{.col}_mean"))
Output:
# A tibble: 3 × 4
Ranking_blah Ranking_bleh Ranking_blah_mean Ranking_bleh_mean
<dbl> <dbl> <dbl> <dbl>
1 1 0 0 0.5
2 -1 1 0 0.5
3 NA 0.5 0 0.5
Data:
tibble(Ranking_blah = c(1,-1,NA), Ranking_bleh = c(0,1,0.5))