Create new column using data from If Else Loop in R-CodePudding

I have data frame that consist of names and scores of certain individual

       name  score
0       Ted     90
1   Rebecca     88
2       Roy     78
3    Leslie     85
4    Nathan     75
5     Jamie     70
6       Sam     78
7     Isaac     70
8    Keeley     85
9     Beard     90
10    Colin     70
11     Will     70
12      Jan     82
13  Richard     70

I want to add new column called verdict that contain their degree based on their score. I used loping to do that with hope that the result would be like this

       name  score               verdict
0       Ted     90     Passed, Cum Laude
1   Rebecca     88     Passed, Cum Laude
2       Roy     78          Passed, Good
3    Leslie     85     Passed, Cum Laude
4    Nathan     75          Passed, Good
5     Jamie     70          Passed, Good
6       Sam     78          Passed, Good
7     Isaac     70          Passed, Good 
8    Keeley     85     Passed, Cum Laude
9     Beard     90     Passed, Cum Laude
10    Colin     70          Passed, Good
11     Will     70          Passed, Good
12      Jan     82     Passed, Excellent
13  Richard     70          Passed, Good

I'm using this code below to do that, but nothing happened. The new column didnt exist and there is no error or warning message in R console

df$verdict <-
  for (score in df$score){
    if (score >= 85)
      return('Passed, Cum Laude')
    else if (score < 85 & score >= 80)
      return('Passed, Excellent')
    else if (score < 80 & score >= 70)
      return('Passed, Good')
    else if (score < 70 & score >= 60)
      return('Passed')
    else
      return('Not Passed')
}

CodePudding user response：

Sure this can be done with cases or ifelse statements, but I think the best way is to use the base function cut here.

code

scores <- c(0, 60, 70, 80, 85, 100)
score_labels <- c("Not Passed", "Passed", "Passed, Good", "Passed, Excellent", "Passed, Cum Laude")

# using dplyr
df %>% mutate(verdict = cut(score, breaks = scores, labels = score_labels, right = FALSE))

# or in just base
df$verdict <- cut(df$score, breaks = scores, labels = score_labels, right = FALSE)

output

      name score           verdict
1      Ted    90 Passed, Cum Laude
2  Rebecca    88 Passed, Cum Laude
3      Roy    78      Passed, Good
4   Leslie    85 Passed, Cum Laude
5   Nathan    75      Passed, Good
6    Jamie    70      Passed, Good
7      Sam    78      Passed, Good
8    Isaac    70      Passed, Good
9   Keeley    85 Passed, Cum Laude
10   Beard    90 Passed, Cum Laude
11   Colin    70      Passed, Good
12    Will    70      Passed, Good
13     Jan    82 Passed, Excellent
14 Richard    70      Passed, Good

data

df <- structure(list(name = c("Ted", "Rebecca", "Roy", "Leslie", "Nathan", 
"Jamie", "Sam", "Isaac", "Keeley", "Beard", "Colin", "Will", 
"Jan", "Richard"), score = c(90L, 88L, 78L, 85L, 75L, 70L, 78L, 
70L, 85L, 90L, 70L, 70L, 82L, 70L)), row.names = c(NA, -14L), class = c("data.frame"))

sidenotes

With cut your breaks vector has one more item than your labels vector. This is because they are based on the breaks resulting in one group less, like here the 6 score values give these 5 groups: 0-60, 60-70, 70-80, 80-85 and 85-100
right = TRUE versus right = FALSE means how to treat the boundaries, compare it with > versus >=. right = TRUE would have resulted in those with a score of 70 fall in the group "Passed" while with right = FALSE it falls in the "Passed, Good" group.

CodePudding user response：

The if statement in R is not vectorized, and you would instead want to use ifelse. In this case, the case_when() function from the dplyr library is a good fit for your requirement:

df$verdict <- case_when(
    df$score >= 85 ~ "Passed, Cum Laude",
    df$score >= 80 ~ "Passed, Excellent",
    df$score >= 70 ~ "Passed, Good",
    df$score >= 60 ~ "Passed",
    TRUE ~ "Not Passed"
)

CodePudding user response：

When having multiple ifelse statements, consider using dplyr::case_when() instead:

Code:

library(dplyr)
df %>% 
  mutate(verdict = case_when(
    score >= 85 ~ 'Passed, Cum Laude',
    score < 85 & score >= 80 ~ 'Passed, Excellent',
    score < 80 & score >= 70 ~ 'Passed, Good',
    score < 70 & score >= 60 ~ 'Passed',
    TRUE ~ 'Not Passed'
  ))

Output:

       name score           verdict
     <char> <int>            <char>
 1:     Ted    90 Passed, Cum Laude
 2: Rebecca    88 Passed, Cum Laude
 3:     Roy    78      Passed, Good
 4:  Leslie    85 Passed, Cum Laude
 5:  Nathan    75      Passed, Good
 6:   Jamie    70      Passed, Good
 7:     Sam    78      Passed, Good
 8:   Isaac    70      Passed, Good
 9:  Keeley    85 Passed, Cum Laude
10:   Beard    90 Passed, Cum Laude
11:   Colin    70      Passed, Good
12:    Will    70      Passed, Good
13:     Jan    82 Passed, Excellent
14: Richard    70      Passed, Good

CodePudding user response：

The return function is used only to return value from a function, in the definition of the function itself. R should tell you there is no function to return from.

Also, by looping along score values, you're not telling R the row and column coordinates where to add this value.

If you want to avoid to load a full library to get access to the case_when function, you can change a little bit your code.

First example, using apply to loop along the rows.

df$verdict <-apply(df, MARGIN = 1, FUN=function(X){
    if (X[2] >= 85)
        return('Passed, Cum Laude')
    else if (X[2] < 85 & X[2] >= 80)
        return('Passed, Excellent')
    else if (X[2] < 80 & X[2] >= 70)
        return('Passed, Good')
    else if (X[2] < 70 & X[2] >= 60)
        return('Passed') else   return('Not Passed')
    })

Or, you can use the ifelse function which is vectorised. However, it is not easy to read or debug.

df$verdict<-ifelse(df$score>=85,'Passed, Cum Laude',
                   ifelse(df$score < 85 & df$score >= 80,'Passed, Excellent',
                          ifelse(df$score < 80 & df$score >= 70,'Passed, Good',
                                 ifelse(df$score < 70 & df$score >= 60,'Passed',
                                        'Not Passed'))))

If you want to loop along the rows, you should loop along their index :

for (r in 1:nrow(df)){
  if (df$score[r] >= 85)
    df$verdict[r]<-'Passed, Cum Laude'
  else if (df$score[r] < 85 & df$score[r] >= 80)
    df$verdict[r]<-'Passed, Excellent'
  else if (df$score[r] < 80 & df$score[r] >= 70)
    df$verdict[r]<-'Passed, Good'
  else if (df$score[r] < 70 & df$score[r] >= 60)
    df$verdict[r]<-'Passed'
  else
    df$verdict[r]<-'Not Passed'
}
rm(r)