Home > Software engineering >  Mutating values for only a subset of the data under a condition, while keeping all data rows
Mutating values for only a subset of the data under a condition, while keeping all data rows

Time:08-17

I'm trying to reverse-recode values (i.e. 1 into 5, 5 into 1, etc.) only on a subset of my participant data while keeping all rows: for those people that have indicated a native language English. As my dataset is quite large, I want to avoid splitting it into 2 datasets (people whose first language is English, and those with other first languages) and then trying to copy-paste the results by participant ID back into one dataframe.

Here's a small example to illustrate it:

data1 <- data.frame(primary_school=c(1,2,1,3,4,5,2,1,2,1,3,1,3,3,1,1,4,2,5,1), high_school=c(1,2,3,4,5,1,2,1,1,3,1,3,1,2,3,3,4,2,1,2), relatives=c(1,2,3,4,5,5,2,5,5,3,1,3,5,2,3,3,4,2,1,5),home=c(3,2,3,3,4,5,3,3,2,1,3,1,3,3,3,1,3,2,3,3), siblings=c(1,1,1,4,1,1,2,1,1,3,1,1,1,1,1,1,4,2,1,1), Language_A=c("English","English","English","Tamil","French","Malay","Romanian","English","Quechua","Zapotec", "English","English","English","Tamil","French","Malay","Romanian","English","Quechua","Zapotec"),L1=c("English","English","English","Tamil","French","English","English","English","Quechua","Zapotec","English","English","English","Tamil","French","Malay","Romanian","English","Quechua","Zapotec"))

> data1
   primary_school high_school relatives home siblings Language_A       L1
1               1           1         1    3        1    English  English
2               2           2         2    2        1    English  English
3               1           3         3    3        1    English  English
4               3           4         4    3        4      Tamil    Tamil
5               4           5         5    4        1     French   French
6               5           1         5    5        1      Malay  English
7               2           2         2    3        2   Romanian  English
8               1           1         5    3        1    English  English
9               2           1         5    2        1    Quechua  Quechua
10              1           3         3    1        3    Zapotec  Zapotec
11              3           1         1    3        1    English  English
12              1           3         3    1        1    English  English
13              3           1         5    3        1    English  English
14              3           2         2    3        1      Tamil    Tamil
15              1           3         3    3        1     French   French
16              1           3         3    1        1      Malay    Malay
17              4           4         4    3        4   Romanian Romanian
18              2           2         2    2        2    English  English
19              5           1         1    3        1    Quechua  Quechua
20              1           2         5    3        1    Zapotec  Zapotec

What I first tried was using filter but soon found out it does only subset and split from the dataset the sample of people whose L1 is English (thus satisfying (!(Language_A =="English" | L1 == "English") ), while I'd like to keep all rows:

testtest<- data1 %>%
  filter(!(Language_A =="English" | L1 == "English")) %>% 
  mutate_at(c("primary_school","high_school", "siblings","relatives","home"), 
           funs(recode(., "1"=5,"2"=4, "3"=3, "4"=2, "5"=1)))

Is there any function that works similarly but keeps all of the data?

I also tried something like the below but it seems it's not happy with the arguments I want to use.

testtest<- data1 %>%
  if (Language_A !="English" | L1 != "English"){ 
  mutate_at(c("primary_school","high_school", "siblings","relatives","home"), 
           funs(recode(., "1"=5,"2"=4, "3"=3, "4"=2, "5"=1)))
  } else ()

I saw people resolving similar issues using case_when, but it seems it is mostly applied to mutating a single value into another single value, under different cases. So I'm not sure how I could even apply this for mutating multiple values under a single case.

Any ideas would be very appreciated. Thanks!

CodePudding user response:

We may use across (_at/_all are deprecated in favor of across) to loop over those columns that needs recoding. Then, based on the logic ie. whereever Language_A and L1 are both not 'English', subtract the values from 6 (6- 1 = 5, 6-2= 4, 6-3 = 3, 6-4 = 2, 6-5 = 1 - assuming only values within 1-5 are in each of those columns) or else return the column value

library(dplyr)
data1 %>% 
  mutate(across(primary_school:siblings, 
  ~ case_when(!(Language_A =="English" | L1 == "English") ~ 6 - .x, TRUE ~ .x)))

-output

primary_school high_school relatives home siblings Language_A       L1
1               1           1         1    3        1    English  English
2               2           2         2    2        1    English  English
3               1           3         3    3        1    English  English
4               3           2         2    3        2      Tamil    Tamil
5               2           1         1    2        5     French   French
6               5           1         5    5        1      Malay  English
7               2           2         2    3        2   Romanian  English
8               1           1         5    3        1    English  English
9               4           5         1    4        5    Quechua  Quechua
10              5           3         3    5        3    Zapotec  Zapotec
11              3           1         1    3        1    English  English
12              1           3         3    1        1    English  English
13              3           1         5    3        1    English  English
14              3           4         4    3        5      Tamil    Tamil
15              5           3         3    3        5     French   French
16              5           3         3    5        5      Malay    Malay
17              2           2         2    3        2   Romanian Romanian
18              2           2         2    2        2    English  English
19              1           5         5    3        5    Quechua  Quechua
20              5           4         1    3        5    Zapotec  Zapotec

CodePudding user response:

Here is a similar dplyr solution using ifelse:

library(dplyr)
data1 %>% 
  mutate(across(-c(Language_A, L1), ~ifelse(Language_A=="English" | 
                                              L1 == "English", ., 6-.)))
 primary_school high_school relatives home siblings Language_A       L1
1               1           1         1    3        1    English  English
2               2           2         2    2        1    English  English
3               1           3         3    3        1    English  English
4               3           2         2    3        2      Tamil    Tamil
5               2           1         1    2        5     French   French
6               5           1         5    5        1      Malay  English
7               2           2         2    3        2   Romanian  English
8               1           1         5    3        1    English  English
9               4           5         1    4        5    Quechua  Quechua
10              5           3         3    5        3    Zapotec  Zapotec
11              3           1         1    3        1    English  English
12              1           3         3    1        1    English  English
13              3           1         5    3        1    English  English
14              3           4         4    3        5      Tamil    Tamil
15              5           3         3    3        5     French   French
16              5           3         3    5        5      Malay    Malay
17              2           2         2    3        2   Romanian Romanian
18              2           2         2    2        2    English  English
19              1           5         5    3        5    Quechua  Quechua
20              5           4         1    3        5    Zapotec  Zapotec
  • Related