I want to add a column to predict with a glm high costs. I use the code:
df %>%
mutate(high_costs = case_when(Totalcosts>=4000~"1",
Totalcosts<4000~"0"
))
This gives me the right values apparently, but Now I have 2 questions:
How can I add this column actually to my df?
Is it possible (by using another code) to make the output numeric in stead of factor, because I will predict 0 or 1 in my glm. Or do I have to use a code like
df$y <- as.numeric(as.factor(df$high_costs))
CodePudding user response:
Oh yes.
- You just need to reassign it to a new variable (or if you wish to go full rambo - reassign to df again, though I would strongly advise against this).
df_1 = df %>%
mutate(high_costs = case_when(Totalcosts>=4000~"1",
Totalcosts<4000~"0"
))
You could also have used ifelse()
syntax as well, but I do enjoy the SQL cross over with the case when usage too.
- Yes. First off, the easiest way. Drop the quotes.
df_1 = df %>%
mutate(high_costs = case_when(Totalcosts>=4000~1,
Totalcosts<4000~0
))
R will recognize these as numeric values.
A second approach, however, would be a little daisy chaining. This is needed given what R is actually doing when it makes a character or numeric into a factor (https://www.guru99.com/r-factor-categorical-continuous.html#:~:text=Factor in R is a,integer data values as levels. - Note the second sentence in the highlighted portion)
So, you could do in multiple steps:
df %>%
mutate(high_costs = case_when(Totalcosts>=4000~"1",
Totalcosts<4000~"0"
),
high_costs = as.character(high_costs),
high_costs = as.numeric(high_costs))
Or, wrap all it once, which is harder on the eye, but requires less code.
df_1 = df %>%
mutate(high_costs = as.numeric(as.character(case_when(Totalcosts>=4000~1,
Totalcosts<4000~0
))))
'df$y <- as.numeric(as.factor(df$high_costs))' will not work they way you wish, unless you provide a better reason as to why you want a numeric factor value, something that is already being done by R by merit it of it being a factor. I strongly suggest you investigate the differences between characters & factors in R to gain further understanding as to why.