I have data frame that consist of names and scores of certain individual
name score
0 Ted 90
1 Rebecca 88
2 Roy 78
3 Leslie 85
4 Nathan 75
5 Jamie 70
6 Sam 78
7 Isaac 70
8 Keeley 85
9 Beard 90
10 Colin 70
11 Will 70
12 Jan 82
13 Richard 70
I want to add new column called verdict that contain their degree based on their score. I used loping to do that with hope that the result would be like this
name score verdict
0 Ted 90 Passed, Cum Laude
1 Rebecca 88 Passed, Cum Laude
2 Roy 78 Passed, Good
3 Leslie 85 Passed, Cum Laude
4 Nathan 75 Passed, Good
5 Jamie 70 Passed, Good
6 Sam 78 Passed, Good
7 Isaac 70 Passed, Good
8 Keeley 85 Passed, Cum Laude
9 Beard 90 Passed, Cum Laude
10 Colin 70 Passed, Good
11 Will 70 Passed, Good
12 Jan 82 Passed, Excellent
13 Richard 70 Passed, Good
I'm using this code below to do that, but nothing happened. The new column didnt exist and there is no error or warning message in R console
df$verdict <-
for (score in df$score){
if (score >= 85)
return('Passed, Cum Laude')
else if (score < 85 & score >= 80)
return('Passed, Excellent')
else if (score < 80 & score >= 70)
return('Passed, Good')
else if (score < 70 & score >= 60)
return('Passed')
else
return('Not Passed')
}
CodePudding user response:
Sure this can be done with cases or ifelse statements, but I think the best way is to use the base function cut
here.
code
scores <- c(0, 60, 70, 80, 85, 100)
score_labels <- c("Not Passed", "Passed", "Passed, Good", "Passed, Excellent", "Passed, Cum Laude")
# using dplyr
df %>% mutate(verdict = cut(score, breaks = scores, labels = score_labels, right = FALSE))
# or in just base
df$verdict <- cut(df$score, breaks = scores, labels = score_labels, right = FALSE)
output
name score verdict
1 Ted 90 Passed, Cum Laude
2 Rebecca 88 Passed, Cum Laude
3 Roy 78 Passed, Good
4 Leslie 85 Passed, Cum Laude
5 Nathan 75 Passed, Good
6 Jamie 70 Passed, Good
7 Sam 78 Passed, Good
8 Isaac 70 Passed, Good
9 Keeley 85 Passed, Cum Laude
10 Beard 90 Passed, Cum Laude
11 Colin 70 Passed, Good
12 Will 70 Passed, Good
13 Jan 82 Passed, Excellent
14 Richard 70 Passed, Good
data
df <- structure(list(name = c("Ted", "Rebecca", "Roy", "Leslie", "Nathan",
"Jamie", "Sam", "Isaac", "Keeley", "Beard", "Colin", "Will",
"Jan", "Richard"), score = c(90L, 88L, 78L, 85L, 75L, 70L, 78L,
70L, 85L, 90L, 70L, 70L, 82L, 70L)), row.names = c(NA, -14L), class = c("data.frame"))
sidenotes
- With
cut
your breaks vector has one more item than your labels vector. This is because they are based on the breaks resulting in one group less, like here the 6 score values give these 5 groups: 0-60, 60-70, 70-80, 80-85 and 85-100 right = TRUE
versusright = FALSE
means how to treat the boundaries, compare it with>
versus>=
.right = TRUE
would have resulted in those with a score of 70 fall in the group "Passed" while withright = FALSE
it falls in the "Passed, Good" group.
CodePudding user response:
The if
statement in R is not vectorized, and you would instead want to use ifelse
. In this case, the case_when()
function from the dplyr
library is a good fit for your requirement:
df$verdict <- case_when(
df$score >= 85 ~ "Passed, Cum Laude",
df$score >= 80 ~ "Passed, Excellent",
df$score >= 70 ~ "Passed, Good",
df$score >= 60 ~ "Passed",
TRUE ~ "Not Passed"
)
CodePudding user response:
When having multiple ifelse
statements, consider using dplyr::case_when()
instead:
Code:
library(dplyr)
df %>%
mutate(verdict = case_when(
score >= 85 ~ 'Passed, Cum Laude',
score < 85 & score >= 80 ~ 'Passed, Excellent',
score < 80 & score >= 70 ~ 'Passed, Good',
score < 70 & score >= 60 ~ 'Passed',
TRUE ~ 'Not Passed'
))
Output:
name score verdict
<char> <int> <char>
1: Ted 90 Passed, Cum Laude
2: Rebecca 88 Passed, Cum Laude
3: Roy 78 Passed, Good
4: Leslie 85 Passed, Cum Laude
5: Nathan 75 Passed, Good
6: Jamie 70 Passed, Good
7: Sam 78 Passed, Good
8: Isaac 70 Passed, Good
9: Keeley 85 Passed, Cum Laude
10: Beard 90 Passed, Cum Laude
11: Colin 70 Passed, Good
12: Will 70 Passed, Good
13: Jan 82 Passed, Excellent
14: Richard 70 Passed, Good
CodePudding user response:
The return
function is used only to return value from a function, in the definition of the function itself. R should tell you there is no function to return from.
Also, by looping along score values, you're not telling R the row and column coordinates where to add this value.
If you want to avoid to load a full library to get access to the case_when
function, you can change a little bit your code.
First example, using apply
to loop along the rows.
df$verdict <-apply(df, MARGIN = 1, FUN=function(X){
if (X[2] >= 85)
return('Passed, Cum Laude')
else if (X[2] < 85 & X[2] >= 80)
return('Passed, Excellent')
else if (X[2] < 80 & X[2] >= 70)
return('Passed, Good')
else if (X[2] < 70 & X[2] >= 60)
return('Passed') else return('Not Passed')
})
Or, you can use the ifelse
function which is vectorised. However, it is not easy to read or debug.
df$verdict<-ifelse(df$score>=85,'Passed, Cum Laude',
ifelse(df$score < 85 & df$score >= 80,'Passed, Excellent',
ifelse(df$score < 80 & df$score >= 70,'Passed, Good',
ifelse(df$score < 70 & df$score >= 60,'Passed',
'Not Passed'))))
If you want to loop along the rows, you should loop along their index :
for (r in 1:nrow(df)){
if (df$score[r] >= 85)
df$verdict[r]<-'Passed, Cum Laude'
else if (df$score[r] < 85 & df$score[r] >= 80)
df$verdict[r]<-'Passed, Excellent'
else if (df$score[r] < 80 & df$score[r] >= 70)
df$verdict[r]<-'Passed, Good'
else if (df$score[r] < 70 & df$score[r] >= 60)
df$verdict[r]<-'Passed'
else
df$verdict[r]<-'Not Passed'
}
rm(r)