I am performing simple column-wise math operations on data frame rows that also involve accessing adjacent, previous data frame rows. Although the below code works, it's cumbersome (at least with respect to my liberal use of cbind()
and subset()
functions) and I wonder if there's a clean way to get the same results using an apply()
or other super duper R function. In base R if possible.
I'm adding and subtracting column values in each data frame row (referring to the below columns, "plus1" "plus 2" - "minus" = "total"), and if the id number is the same as you move down from one row to the next, adding in the plus1 from the prior row. See below illustration:
id plus1 plus2 minus total [total explained]
1 1 3 5 10 -2
2 2 4 5 9 0
3 3 8 5 8 5 [8 5 - 8 = 5, ignoring "plus1" in row 2 since "id" changed between rows 2 and 3]
4 3 1 4 7 6 [1 4 - 7, 8 from "plus1" col in row 3 since "id" is same in rows 3 and 4, = 6]
5 3 2 5 6 2 [2 5 - 6, 1 from "plus1" col in row 4 since "id" is same in rows 4 and 5, = 2]
6 5 3 6 5 4 [3 6 - 5 = 4, ignoring "plus1" in row 5 since "id" changed between rows 5 and 6]
Here is the code I used to generate the above:
data <- data.frame(id=c(1,2,3,3,3,5),
plus1=c(3,4,8,1,2,3),
plus2=c(5,5,5,4,5,6),
minus = c(10,9,8,7,6,5))
data <- cbind(data,
tmp1=(data[ ,"plus1"]
data[ ,"plus2"] -
data[ ,"minus"]
)
)
grp <- with(rle(data$id), rep(seq_along(values), lengths))
data$tmp2 <- with(data,ave(plus1, grp, FUN = function(x) c(0, x[-length(x)])))
data <- cbind(data, total = round((data[ ,"tmp1"] data[ ,"tmp2"]),2))
data <- subset(data, select = -c(tmp1,tmp2) )
data
I'm pursuing simplicity in the world of apply()
because I'll be using many derivations of this sort thing in my current project. Looks like I'm mimicking Excel in R, which I am.
CodePudding user response:
I think a simple way to do this is using the lag function from dplyr package. I used case_when to check if the id changed. If it didn't change, you add the extra term, otherwise you add 0.
library(dplyr)
data2<-data %>%
mutate(extra=case_when(
id==lag(id) ~ lag(plus1),
TRUE ~ 0
)) %>%
mutate(computed_total=plus1 plus2-minus extra)
CodePudding user response:
Here's a base R solution, but not with apply
, so may not be acceptable, although it seems less complex than the OP's code.
By way of comparison have added a dplyr
solution although I am aware OP asked specifically for base R.
data$lag_id <- c(0, data$id[-nrow(data)])
data$lag_plus1 <- c(NA, data$plus1[-nrow(data)])
data$total <- with(data, ifelse(id == lag_id, plus1 plus2 - minus lag_plus1, plus1 plus2 - minus))
data[ , -c(5:6)]
#> id plus1 plus2 minus total
#> 1 1 3 5 10 -2
#> 2 2 4 5 9 0
#> 3 3 8 5 8 5
#> 4 3 1 4 7 6
#> 5 3 2 5 6 2
#> 6 5 3 6 5 4
library(dplyr)
data %>%
mutate(total = case_when(id == lag(id) ~ plus1 plus2 - minus lag(plus1),
TRUE ~ plus1 plus2 - minus))
#> id plus1 plus2 minus total
#> 1 1 3 5 10 -2
#> 2 2 4 5 9 0
#> 3 3 8 5 8 5
#> 4 3 1 4 7 6
#> 5 3 2 5 6 2
#> 6 5 3 6 5 4
Created on 2021-12-11 by the reprex package (v2.0.1)