With dplyr, I would like to create a new variable new_regsiege
with the following conditions:
For each line and "XX" if regsiege
=="XX" and nbeta_regXX
>0 then new_regsiege=regsiege
if regsiege
=="XX" and nbeta_regXX
=0 then new_regsiege is one of the regsiege randomly chosen among those which are not null.
Here's my example :
mydf <- data.frame(
regsiege = c("11","24","93"),
nbeta_reg11 = c(0,1,0),
nbeta_reg24 = c(1,1,0),
nbeta_reg93 = c(1,1,1)
)
# Desired output
regsiege nbeta_reg11 nbeta_reg24 nbeta_reg93 new_regsiege
11 0 1 1 93 (could be also "24")
24 1 1 1 24
93 0 0 1 93
I started like this :
mydf %>%
rowwise()
mutate(
new_regsiege = if_else(...
)
CodePudding user response:
You could try
library(tidyverse)
mydf %>%
pivot_longer(-1, names_prefix = "nbeta_reg") %>%
group_by(regsiege) %>%
summarise(new_regsiege = if(value[regsiege == name] > 0) regsiege[1]
else sample(name[value > 0], 1)) %>%
left_join(mydf, ., by = "regsiege")
# regsiege nbeta_reg11 nbeta_reg24 nbeta_reg93 new_regsiege
# 1 11 0 1 1 93
# 2 24 1 1 1 24
# 3 93 0 0 1 93
CodePudding user response:
The simplest solution would use a loop:
for (i in seq_len(nrow(mydf))) {
cur_regsiege <- mydf[i, "regsiege"]
same <- mydf[i, paste0("nbeta_reg", cur_regsiege)]
mydf[i, "new_regsiege"] <- if (same) cur_regsiege else sample(mydf[-i, "regsiege"], 1L)
}
# regsiege nbeta_reg11 nbeta_reg24 nbeta_reg93 new_regsiege
# 1 11 0 1 1 93
# 2 24 1 1 1 24
# 3 93 0 0 1 93