Column I've created is apparently "undefined"?-CodePudding

My code is as below. I have created a column "degree" based on another column which contains integers from 1 to 5.

My code below seems to work because the column has been created sucessfully. However, when I call any code based on the "degrees" column I get NULLL str(my_data$degree)

my_data %>%
mutate(degree = case_when(edcat > 3 ~ "1",                                 
 edcat <=3 ~ "0") )

This is what I get when I use "degree" in any code despite the fact I can see the column has been sucessfully created:

Error in [.data.frame(my_data, , "degree"): undefined columns selected
Traceback:

1. factor(my_data\[, "degree"\])
2. my_data\[, "degree"\]
3. [.data.frame(my_data, , "degree")
4. stop("undefined columns selected")

CodePudding user response：

when you want to update(overwrite) a data frame with new calculation simply use <- like for a variable. However, its better to save in a new df to check the result and keep copy of original (for a beginner to compare input and output) here I save it in my_result. Or instead use my_data <-

my_result<- my_data %>%
mutate(degree = case_when(
 edcat > 3 ~ "1",                                 
 edcat <=3 ~ "0"))

Or if you are using same df in next processes:

my_data<- my_data %>%
    mutate(degree = case_when(
     edcat > 3 ~ "1",                                 
     edcat <=3 ~ "0"))

with sample data for edcat :

my_data <- data.frame('edcat'= c(1,2,3,5,6,8))
my_data <- my_data%>%mutate(degree = case_when(
  edcat > 3 ~ "1",                                 
  edcat <=3 ~ "0"))

my_data

  edcat degree
1     1      0
2     2      0
3     3      0
4     5      1
5     6      1
6     8      1

Now you can use it any way say count of degrees:

my_data%>%group_by(degree)%>%summarise(N=n())
# A tibble: 2 x 2
  degree     N
  <chr>  <int>
1 0          3
2 1          3

But all this is basic. Please check good resources to learn dplyr like Hadley Wickams R 4 Data Science