My code is as below. I have created a column "degree" based on another column which contains integers from 1 to 5.
My code below seems to work because the column has been created sucessfully. However, when I call any code based on the "degrees" column I get NULLL str(my_data$degree)
my_data %>% mutate(degree = case_when(edcat > 3 ~ "1", edcat <=3 ~ "0") )
This is what I get when I use "degree" in any code despite the fact I can see the column has been sucessfully created:
Error in [.data.frame
(my_data, , "degree"): undefined columns selected
Traceback:
- factor(my_data[, "degree"])
- my_data[, "degree"]
[.data.frame
(my_data, , "degree")- stop("undefined columns selected")
CodePudding user response:
when you want to update(overwrite) a data frame with new calculation simply use <-
like for a variable. However, its better to save in a new df to check the result and keep copy of original (for a beginner to compare input and output)
here I save it in my_result. Or instead use my_data <-
my_result<- my_data %>%
mutate(degree = case_when(
edcat > 3 ~ "1",
edcat <=3 ~ "0"))
Or if you are using same df in next processes:
my_data<- my_data %>%
mutate(degree = case_when(
edcat > 3 ~ "1",
edcat <=3 ~ "0"))
with sample data for edcat :
my_data <- data.frame('edcat'= c(1,2,3,5,6,8))
my_data <- my_data%>%mutate(degree = case_when(
edcat > 3 ~ "1",
edcat <=3 ~ "0"))
my_data
edcat degree
1 1 0
2 2 0
3 3 0
4 5 1
5 6 1
6 8 1
Now you can use it any way say count of degrees:
my_data%>%group_by(degree)%>%summarise(N=n())
# A tibble: 2 x 2
degree N
<chr> <int>
1 0 3
2 1 3
But all this is basic. Please check good resources to learn dplyr like Hadley Wickams R 4 Data Science