I have a data.frame of student project choices in a columns called choice_1 and choice_2 but some entries are 0, which mean they didn't chose a second option.
df<-data.frame(studentID=c(102,103,104), choice_1=c(699,500,750),choice_2=c(698,0,0))
I have a data.frame of all possible choices:
choices<-data.frame(choices=seq(600:800)).
I would like to fill in the 0s in choice_1 or choice_2 based on any value in the data.frame choices
.
CodePudding user response:
Is this what you're looking for?
df[df == 0] <- sample(choices$choices, length(df[df == 0]))
CodePudding user response:
Assuming you're hoping to do it for one or more choice_*
columns, then:
set.seed(42)
df[,-1] <- lapply(df[,-1], function(z) {
z[z==0] <- sample(choices$choices, size = sum(z == 0), replace = TRUE)
z
})
df
# studentID choice_1 choice_2
# 1 102 699 698
# 2 103 500 49
# 3 104 750 65
In dplyr-ese:
df %>%
mutate(
across(starts_with("choice_"),
~ if_else(. == 0, as.numeric(sample(choices$choices, size = n(), replace = TRUE)), .)
)
)
# studentID choice_1 choice_2
# 1 102 699 698
# 2 103 500 49
# 3 104 750 65
The reason for as.numeric
is because if_else
strictly enforces class, and while choice_*
are numeric
, your choices$choices
is integer
, so if_else
complains. An alternative is to replace if_else
with ifelse
(and remove the as.numeric
), since base::ifelse
is a bit sloppier in that regard.
CodePudding user response:
We could use across
with an ifelse
statement:
library(dplyr)
df %>%
mutate(across(starts_with("choice"), ~ifelse(.==0, sample(choices$choices), .)))
studentID choice_1 choice_2
1 102 699 698
2 103 500 48
3 104 750 179