Home > Blockchain >  how to add a new column on dataframe using a function
how to add a new column on dataframe using a function

Time:06-10

i need to create a column on a dataframe containing a conditional rule, that will apply only to the values of a singular column called "value".

i have already created the function, however when I try to add it to ne data frame I encounter the following error

fun <- function(x) { #function
     if(x < 10) {return ("white")
     } else if (x >= 10 && x < 30) {
       return ("yellow")
         } else if (x >= 30 && x < 50) {
           return ("red")
         } else {
             return ("black")
           }
}

#applying as a column on dataframe
dt$newcolumn <- apply(dt, 1, fun(dt$value))

error

  Error in if (x < 10) { : missing value where TRUE/FALSE needed
    In addition: Warning message:
    In if (x < 10) { :
      the condition has length > 1 and only the first element will be used

I am prob. doing something wrong when applying the fucntion to the dataframe.

thanks in advance!

CodePudding user response:

As suggested in comments you can use vectorised ifelse or dplyr::case_when which will avoid the use of apply or any other loop. You can directly apply the function to whole column.

library(dplyr)

fun <- function(x) { #function
  case_when(x < 10 ~ "white", 
            between(x, 10, 29) ~ "yellow", 
            between(x, 30, 49) ~ "red", 
            TRUE ~ "black")
}
dt <- data.frame(value = c(5, 89, 30, 15))
dt$newcolumn <- fun(dt$value)
dt

#  value newcolumn
#1     5     white
#2    89     black
#3    30       red
#4    15    yellow

CodePudding user response:

The logical test in an if statement needs to evaluate to a single TRUE or FALSE, whereas in your case it evaluates to a vector of TRUE and FALSE, which is what causes the error. The vectorized equivalent is ifelse, but that is still not ideal here because you require nested ifelse statements. One can either use switch or case_when for the case of multiple options in the output, but the neatest base R solution here is probably just to use cut:

fun <- function(x) {
  as.character(cut(x, c(-Inf, 10, 30, 50, Inf) - 0.000001, 
                   c('white', 'yellow', 'red', 'black')))
}

fun(1:100)
#> [1] "white"  "white"  "white"  "white"  "white"  "white"  "white"  "white" 
#> [9] "white"  "yellow" "yellow" "yellow" "yellow" "yellow" "yellow" "yellow"
#> [17] "yellow" "yellow" "yellow" "yellow" "yellow" "yellow" "yellow" "yellow"
#> [25] "yellow" "yellow" "yellow" "yellow" "yellow" "red"    "red"    "red"   
#> [33] "red"    "red"    "red"    "red"    "red"    "red"    "red"    "red"   
#> [41] "red"    "red"    "red"    "red"    "red"    "red"    "red"    "red"   
#> [49] "red"    "black"  "black"  "black"  "black"  "black"  "black"  "black" 
#> [57] "black"  "black"  "black"  "black"  "black"  "black"  "black"  "black" 
#> [65] "black"  "black"  "black"  "black"  "black"  "black"  "black"  "black" 
#> [73] "black"  "black"  "black"  "black"  "black"  "black"  "black"  "black" 
#> [81] "black"  "black"  "black"  "black"  "black"  "black"  "black"  "black" 
#> [89] "black"  "black"  "black"  "black"  "black"  "black"  "black"  "black" 
#> [97] "black"  "black"  "black"  "black" 
  • Related