I've looked up what to do in this case and haven't found much information that I could use, so any advice would be greatly appreciated
I have a dataset that separates males and females for certain variables. I would like to combine them and use the combined variable in logistic regression.
example of how data looks
male<- c("weekly","monthly","","never","","","weekly")
female<- c("","","never","","daily","weekly","")
df<-data.frame(male,female)
My code looks like this
df$combined<- paste(df$male,df$female)
model_00_<- glm(formula= df$outcome ~ df$main_predictor df$combined, data=df, family=binomial(link="logit"))
exp(cbind(OR=coef(model_00_),confint(model_00_)))
but when I do the output looks like this (arbitrary numbers for simplicity)
OR 2.5% 97.5%
intercept 9 6 11
daily 4 3 7
weekly 3 2 6
monthly 2.5 1.5 4
never 0.75 0.6 0.9
daily 4 3 7
weekly 3 2 6
monthly 2.5 1.5 4
never NA NA NA
I think this is happening because of the "paste" function but I am unsure as to how I can marry the two variables without the "paste" function
CodePudding user response:
As others have mentioned, paste
is a bad solution because it adds whitespace between the things being pasted. But I do not like using paste0
either, because it doesn't really consider the original variables as data -- just pastes them together as characters.
As Limey's comment above mentions, I think coalesce
is the better solution than either. coalesce(x, y)
simply takes the value of x
unless it is NA or NULL, in which case the value of y
is used. Thus:
male <- c("weekly", "monthly", NA, "never", NA, NA, "weekly")
female <- c(NA, NA, "never", NA, "daily", "weekly", NA)
df <- data.frame(male, female)
df
> df
male female
1 weekly <NA>
2 monthly <NA>
3 <NA> never
4 never <NA>
5 <NA> daily
6 <NA> weekly
7 weekly <NA>
library(dplyr)
desired_output <- coalesce(male, female)
desired_output
> desired_output
[1] "weekly" "monthly" "never" "never" "daily" "weekly" "weekly"
However, note that if your empty cells in the original data file have any whitespace in them, or were empty strings (""), then coalesce
would not work. An empty string is different than a missing value.