In the reproducible working code shown at the bottom of this post, I'm trying to add a column ("ClassAtSplit", shown in the image below with the header highlighted orange) to the R dataframe that flags the activity in other columns of that dataframe. I could use a series of cumbersome conditionals and offsets like I do in the Excel formula for "ClassAtSplit" shown in the right-most column in the image below labeled "ClassAtSplitFormula", but I am looking for an efficient way to do this in R using something like dplyr
. Any suggestions for doing this?
I'm not trying to reproduce "ClassAtSplitFormula"! I only included it to show the formulas in "ClassAtSplit" column.
Reproducible code:
library(dplyr)
data <-
data.frame(
Element = c("C","B","D","A","A","A","C","B","B","B"),
SplitCode = c(0,0,0,0,1,1,0,0,2,2)
)
data <- data %>% group_by(Element) %>% mutate(PreSplitClass=row_number())
data
CodePudding user response:
Likely there is the most efficient way to solve this just in one or two lines but with a case_when()
you can implement fast and clearly the nested ifelse excel formulas using dplyr
.
library(dplyr)
data <-
data.frame(
Element = c("C","B","D","A","A","A","C","B","B","B"),
SplitCode = c(0,0,0,0,1,1,0,0,2,2)
)
data <- data %>% group_by(Element) %>% mutate(PreSplitClass=row_number()) %>% ungroup()
data %>%
mutate(ClassAtSplit =
case_when(
SplitCode == 0 ~as.integer(0), # This eliminates checking for > 0
SplitCode > lag(SplitCode) ~ PreSplitClass, # if > previous value
SplitCode == SplitCode ~ lag(PreSplitClass) # if equal (0s are avoided)
)
)
Output:
# A tibble: 10 × 4
Element SplitCode PreSplitClass ClassAtSplit
<chr> <dbl> <int> <int>
1 C 0 1 0
2 B 0 1 0
3 D 0 1 0
4 A 0 1 0
5 A 1 2 2
6 A 1 3 2
7 C 0 2 0
8 B 0 2 0
9 B 2 3 3
10 B 2 4 3
If you specify the first case SplitCode == 0
then you won't need to check all the time if the values are equal only because they are 0s.