R: Replace 0 with another value from another vector of values-CodePudding

I have a data.frame of student project choices in a columns called choice_1 and choice_2 but some entries are 0, which mean they didn't chose a second option.

df<-data.frame(studentID=c(102,103,104), choice_1=c(699,500,750),choice_2=c(698,0,0))

I have a data.frame of all possible choices:

choices<-data.frame(choices=seq(600:800)).

I would like to fill in the 0s in choice_1 or choice_2 based on any value in the data.frame choices.

CodePudding user response：

Is this what you're looking for?

df[df == 0] <- sample(choices$choices, length(df[df == 0]))

CodePudding user response：

Assuming you're hoping to do it for one or more choice_* columns, then:

set.seed(42)
df[,-1] <- lapply(df[,-1], function(z) {
  z[z==0] <- sample(choices$choices, size = sum(z == 0), replace = TRUE)
  z
})
df
#   studentID choice_1 choice_2
# 1       102      699      698
# 2       103      500       49
# 3       104      750       65

In dplyr-ese:

df %>%
  mutate(
    across(starts_with("choice_"),
    ~ if_else(. == 0, as.numeric(sample(choices$choices, size = n(), replace = TRUE)), .)
    )
  )
#   studentID choice_1 choice_2
# 1       102      699      698
# 2       103      500       49
# 3       104      750       65

The reason for as.numeric is because if_else strictly enforces class, and while choice_* are numeric, your choices$choices is integer, so if_else complains. An alternative is to replace if_else with ifelse (and remove the as.numeric), since base::ifelse is a bit sloppier in that regard.

CodePudding user response：

We could use across with an ifelse statement:

library(dplyr)
df %>% 
  mutate(across(starts_with("choice"), ~ifelse(.==0, sample(choices$choices), .)))

  studentID choice_1 choice_2
1       102      699      698
2       103      500       48
3       104      750      179