Home > Back-end >  Add new column with 0 and 1 depending on a numeric value in column x with mutate
Add new column with 0 and 1 depending on a numeric value in column x with mutate

Time:05-08

I want to add a column to predict with a glm high costs. I use the code:

 df %>%
      mutate(high_costs = case_when(Totalcosts>=4000~"1",
                                     Totalcosts<4000~"0"
                                     ))

This gives me the right values apparently, but Now I have 2 questions:

  1. How can I add this column actually to my df?

  2. Is it possible (by using another code) to make the output numeric in stead of factor, because I will predict 0 or 1 in my glm. Or do I have to use a code like

    df$y <- as.numeric(as.factor(df$high_costs))

CodePudding user response:

Oh yes.

  1. You just need to reassign it to a new variable (or if you wish to go full rambo - reassign to df again, though I would strongly advise against this).
df_1 = df %>%
      mutate(high_costs = case_when(Totalcosts>=4000~"1",
                                     Totalcosts<4000~"0"
                                     ))

You could also have used ifelse() syntax as well, but I do enjoy the SQL cross over with the case when usage too.

  1. Yes. First off, the easiest way. Drop the quotes.
df_1 = df %>%
      mutate(high_costs = case_when(Totalcosts>=4000~1,
                                     Totalcosts<4000~0
                                     ))

R will recognize these as numeric values.

A second approach, however, would be a little daisy chaining. This is needed given what R is actually doing when it makes a character or numeric into a factor (https://www.guru99.com/r-factor-categorical-continuous.html#:~:text=Factor in R is a,integer data values as levels. - Note the second sentence in the highlighted portion)

So, you could do in multiple steps:

df %>%
      mutate(high_costs = case_when(Totalcosts>=4000~"1",
                                     Totalcosts<4000~"0"
                                     ),
             high_costs = as.character(high_costs),
             high_costs = as.numeric(high_costs)) 
    

Or, wrap all it once, which is harder on the eye, but requires less code.

df_1 = df %>%
      mutate(high_costs = as.numeric(as.character(case_when(Totalcosts>=4000~1,
                                     Totalcosts<4000~0
                                     ))))

'df$y <- as.numeric(as.factor(df$high_costs))' will not work they way you wish, unless you provide a better reason as to why you want a numeric factor value, something that is already being done by R by merit it of it being a factor. I strongly suggest you investigate the differences between characters & factors in R to gain further understanding as to why.

  • Related