Home > OS >  R: How to loop over a name-based selection of variables from a dataframe and for each create a new v
R: How to loop over a name-based selection of variables from a dataframe and for each create a new v

Time:07-21

I have a dataset containing a number of numeric variables whose names all start with "Ranking". For each of these variables, I want to add another variable to the dataset that contains the column mean of the first variable.

So the data look something like this:

| Ranking_blah | Ranking_bleh | 

| --------     | ----------   |

| 1            | 0            |

| 0            | 1            |

| NA           | 0.5          |

and what I want is:

| Ranking_blah | Ranking_bleh | Ranking_blah_mean | Ranking_bleh_mean |

| --------     | ----------   |----------------   |----------------|

| 1            | 0            | 0                 | 0.5            |

| -1           | 1            | 0                 | 0.5            |

| NA           | 0.5          | 0                 | 0.5    

(I am aware this way the mean variables have the same values in all rows, respectively - I need this because the data will be reshaped later)

What I've tried so far:

#getting a list of all ranking variables I want to create a new mean variable from

ranking_variables = names(data)[grepl("Ranking", names(data))]

#creating a new variable for each base variable in the list and setting it to the mean of the respective base variable

data[paste0(ranking_variables, "_mean")] <- do.call(cbind, lapply(data[ranking_variables], function(x) mean(x, na.rm = TRUE)))

The second part is not working, though, it only yields NA values. What am I doing wrong?

CodePudding user response:

An alternative approach is to use dplyr's across:

dat |>
    mutate(across(starts_with("Ranking"), ~ mean(., na.rm = TRUE), .names = "{.col}_mean"))

Output:

# A tibble: 3 × 4
  Ranking_blah Ranking_bleh Ranking_blah_mean Ranking_bleh_mean
         <dbl>        <dbl>             <dbl>             <dbl>
1            1          0                   0               0.5
2           -1          1                   0               0.5
3           NA          0.5                 0               0.5

Data:

tibble(Ranking_blah = c(1,-1,NA), Ranking_bleh = c(0,1,0.5))
  • Related