Home > Software design >  A function to create a new column based on existing one in a dataframe using ifelse()
A function to create a new column based on existing one in a dataframe using ifelse()

Time:10-20

Hey i have a simple function called cont_to_cat that i can not get to work. Basically what i want this function to do is create another column in the dataframe df based on a column called here score_col (the parameter of the function is a string) in the same dataframe df. I want to create this new column(i want to call it 'cat-' score_col) based on conditions on the other column (first condition if df$score_col<6 new column equals "Rating: 6-" ..)

#dataframe to test
df <- as.data.frame(matrix(runif(n=10, min=0, max=10), nrow=10))
cont_to_cat<-function(df,score_col){
  s<-paste('cat_',score_col,sep='')
  print(s)
  print(df[[score_col]])
  df$s <- with(df, ifelse(df$score_col <6,'Rating: 6-',
                          ifelse(df$score_col >=6 & df$score_col < 7, 'Good: 6 ',
                                 ifelse(df$score_col >=7 & df$score_col < 8, 'Very good: 7 ',
                                        ifelse(df$score_col >=8 & df$score_col < 9, 'Fabulous: 8 ',
                                               ifelse(df$score_col >= 9, 'Superb: 9 ','NULL'))))))
  return(df)
}
new_df<-cont_to_cat(df,'V1')

after i run this code i get the error below: enter image description here (Error in $<-.data.frame(*tmp*, "s", value = logical(0)) : le tableau de remplacement a 0 lignes, le tableau remplacé en a 10) This bug got me inactive for a while. I appreciate your help.

CodePudding user response:

You can use case_when to avoid nested ifelse

> set.seed(15151)
> df <- as.data.frame(matrix(runif(n=10, min=0, max=10), nrow=10))
> cont_to_cat <- function(df, score_col){  

  output <-  mutate(df, score = case_when(score_col < 6 ~ "Rating: 6-",
                             score_col >= 6 & score_col < 7  ~ "Good:6 ",
                             score_col >= 7  & score_col < 8 ~ "Very good: 7 ",
                             score_col >= 8 & score_col < 9 ~ "Fabulous: 8 ",
                             score_col >= 9 ~ "Superb: 9 "))
  
  names(output)[names(output) == "score"] <- paste0("cat_", score_col)

  return(output)
  
}
> cont_to_cat(df, score_col = "V1")

         V1   cat_V1
1  9.209413 Superb: 9 
2  2.704474 Rating: 6-
3  5.798527 Rating: 6-
4  9.527739 Superb: 9 
5  3.151152 Rating: 6-
6  2.834159 Rating: 6-
7  2.198065 Rating: 6-
8  3.471788 Rating: 6-
9  4.057178 Rating: 6-
10 6.823411    Good:6 

CodePudding user response:

I think you'd be missing a trick to not take advantage of cut to do this in one go without having to write out the start and end of each range manually:

cont_to_cat <- function(data, score_col) {
    s <- paste0('cat_', score_col)
    data[s] <- cut(
        data[[score_col]],
        breaks=c(0,6,7,8,9,Inf),
        labels=c("Rating: 6-", "Good: 6 ", "Very Good: 7 ", "Fabulous: 8 ", "Superb: 9 ")
    )
    data
}

Using @Jilber's example data:

cont_to_cat(df, score_col="V1")
#         V1     cat_V1
#1  9.209413 Superb: 9 
#2  2.704474 Rating: 6-
#3  5.798527 Rating: 6-
#4  9.527739 Superb: 9 
#5  3.151152 Rating: 6-
#6  2.834159 Rating: 6-
#7  2.198065 Rating: 6-
#8  3.471788 Rating: 6-
#9  4.057178 Rating: 6-
#10 6.823411   Good: 6 

You could even add the breaks and labels as parameters to the function so that it is even more flexible and can be reused for other categorisations:

cont_to_cat2 <- function(data, score_col, breaks, labels) {
    s <- paste0('cat_', score_col)
    data[s] <- cut(data[[score_col]], breaks=breaks, labels=labels)
    data
}

cont_to_cat2(
    df, score_col="V1", breaks=c(0,6,7,8,9,Inf),
    labels=c("Rating: 6-", "Good: 6 ", "Very Good: 7 ", "Fabulous: 8 ", "Superb: 9 ")
)
  • Related