Home > Software design >  How to add a column to an R dataframe flagging activity in other columns?
How to add a column to an R dataframe flagging activity in other columns?

Time:07-09

In the reproducible working code shown at the bottom of this post, I'm trying to add a column ("ClassAtSplit", shown in the image below with the header highlighted orange) to the R dataframe that flags the activity in other columns of that dataframe. I could use a series of cumbersome conditionals and offsets like I do in the Excel formula for "ClassAtSplit" shown in the right-most column in the image below labeled "ClassAtSplitFormula", but I am looking for an efficient way to do this in R using something like dplyr. Any suggestions for doing this?

I'm not trying to reproduce "ClassAtSplitFormula"! I only included it to show the formulas in "ClassAtSplit" column.

enter image description here

Reproducible code:

library(dplyr)

data <- 
  data.frame(
    Element = c("C","B","D","A","A","A","C","B","B","B"),
    SplitCode = c(0,0,0,0,1,1,0,0,2,2)  
  )

data <- data %>% group_by(Element) %>% mutate(PreSplitClass=row_number())
data

CodePudding user response:

Likely there is the most efficient way to solve this just in one or two lines but with a case_when() you can implement fast and clearly the nested ifelse excel formulas using dplyr.

library(dplyr)

data <- 
  data.frame(
    Element = c("C","B","D","A","A","A","C","B","B","B"),
    SplitCode = c(0,0,0,0,1,1,0,0,2,2)  
  )

data <- data %>% group_by(Element) %>% mutate(PreSplitClass=row_number()) %>% ungroup()

data %>%
  mutate(ClassAtSplit =
           case_when(
             SplitCode == 0 ~as.integer(0), # This eliminates checking for > 0
             SplitCode > lag(SplitCode) ~ PreSplitClass, # if > previous value
             SplitCode == SplitCode ~ lag(PreSplitClass) # if equal (0s are avoided)
           )
         )

Output:

# A tibble: 10 × 4
   Element SplitCode PreSplitClass ClassAtSplit
   <chr>       <dbl>         <int>        <int>
 1 C               0             1            0
 2 B               0             1            0
 3 D               0             1            0
 4 A               0             1            0
 5 A               1             2            2
 6 A               1             3            2
 7 C               0             2            0
 8 B               0             2            0
 9 B               2             3            3
10 B               2             4            3

If you specify the first case SplitCode == 0 then you won't need to check all the time if the values are equal only because they are 0s.

  • Related