Hey i have a simple function called cont_to_cat that i can not get to work. Basically what i want this function to do is create another column in the dataframe df based on a column called here score_col (the parameter of the function is a string) in the same dataframe df. I want to create this new column(i want to call it 'cat-' score_col) based on conditions on the other column (first condition if df$score_col<6 new column equals "Rating: 6-" ..)
#dataframe to test
df <- as.data.frame(matrix(runif(n=10, min=0, max=10), nrow=10))
cont_to_cat<-function(df,score_col){
s<-paste('cat_',score_col,sep='')
print(s)
print(df[[score_col]])
df$s <- with(df, ifelse(df$score_col <6,'Rating: 6-',
ifelse(df$score_col >=6 & df$score_col < 7, 'Good: 6 ',
ifelse(df$score_col >=7 & df$score_col < 8, 'Very good: 7 ',
ifelse(df$score_col >=8 & df$score_col < 9, 'Fabulous: 8 ',
ifelse(df$score_col >= 9, 'Superb: 9 ','NULL'))))))
return(df)
}
new_df<-cont_to_cat(df,'V1')
after i run this code i get the error below:
enter image description here
(Error in $<-.data.frame
(*tmp*
, "s", value = logical(0)) :
le tableau de remplacement a 0 lignes, le tableau remplacé en a 10)
This bug got me inactive for a while. I appreciate your help.
CodePudding user response:
You can use case_when
to avoid nested ifelse
> set.seed(15151)
> df <- as.data.frame(matrix(runif(n=10, min=0, max=10), nrow=10))
> cont_to_cat <- function(df, score_col){
output <- mutate(df, score = case_when(score_col < 6 ~ "Rating: 6-",
score_col >= 6 & score_col < 7 ~ "Good:6 ",
score_col >= 7 & score_col < 8 ~ "Very good: 7 ",
score_col >= 8 & score_col < 9 ~ "Fabulous: 8 ",
score_col >= 9 ~ "Superb: 9 "))
names(output)[names(output) == "score"] <- paste0("cat_", score_col)
return(output)
}
> cont_to_cat(df, score_col = "V1")
V1 cat_V1
1 9.209413 Superb: 9
2 2.704474 Rating: 6-
3 5.798527 Rating: 6-
4 9.527739 Superb: 9
5 3.151152 Rating: 6-
6 2.834159 Rating: 6-
7 2.198065 Rating: 6-
8 3.471788 Rating: 6-
9 4.057178 Rating: 6-
10 6.823411 Good:6
CodePudding user response:
I think you'd be missing a trick to not take advantage of cut
to do this in one go without having to write out the start and end of each range manually:
cont_to_cat <- function(data, score_col) {
s <- paste0('cat_', score_col)
data[s] <- cut(
data[[score_col]],
breaks=c(0,6,7,8,9,Inf),
labels=c("Rating: 6-", "Good: 6 ", "Very Good: 7 ", "Fabulous: 8 ", "Superb: 9 ")
)
data
}
Using @Jilber's example data:
cont_to_cat(df, score_col="V1")
# V1 cat_V1
#1 9.209413 Superb: 9
#2 2.704474 Rating: 6-
#3 5.798527 Rating: 6-
#4 9.527739 Superb: 9
#5 3.151152 Rating: 6-
#6 2.834159 Rating: 6-
#7 2.198065 Rating: 6-
#8 3.471788 Rating: 6-
#9 4.057178 Rating: 6-
#10 6.823411 Good: 6
You could even add the breaks and labels as parameters to the function so that it is even more flexible and can be reused for other categorisations:
cont_to_cat2 <- function(data, score_col, breaks, labels) {
s <- paste0('cat_', score_col)
data[s] <- cut(data[[score_col]], breaks=breaks, labels=labels)
data
}
cont_to_cat2(
df, score_col="V1", breaks=c(0,6,7,8,9,Inf),
labels=c("Rating: 6-", "Good: 6 ", "Very Good: 7 ", "Fabulous: 8 ", "Superb: 9 ")
)