Home > Mobile >  mutate variable by condition using two variables in long format data.table in r
mutate variable by condition using two variables in long format data.table in r

Time:12-07

In this data.table:

dt <- data.table(id=c(1,1,1,2,2,2), time=rep(1:3,2), x=c(1,0,0,0,1,0))
dt
   id time x
1:  1    1 1
2:  1    2 0
3:  1    3 0
4:  2    1 0
5:  2    2 1
6:  2    3 0

I need the following:

   id time x
1:  1    1 1
2:  1    2 1
3:  1    3 1
4:  2    1 0
5:  2    2 1
6:  2    3 1

that is

  1. if x==1 at time==1 then x=1 at times 2 and 3, by id
  2. if x==1 at time==2 then x=1 at time 3, by id

For the first point (I guess the second one will be similar), I have tried approaches mentioned in similar questions I posted before (here and here), but none work:

  • dt[x==1[time == 1], x := x[time == 1], id] gives an error
  • setDT(dt)[, x2:= ifelse(x==1 & time==1, x[time==1], x), by=id] changes xonly at time 1 (so, no real change observed)

It would be much easier to work with data.table in wide format, but I keep facing this kind of problem in long format and I don't want to reshape my data all the time

Thank you!

EDIT:

The answer provided by @GregorThomas, dt[, x := cummax(x), by = id], works for the problem that I presented.

Now I ask the same question for a character variable:

dt2 <- data.table(id=c(1,1,1,2,2,2), time=rep(1:3,2), x=c('a','b','b','b','a','b'))
dt2
   id time x
1:  1    1 a
2:  1    2 b
3:  1    3 b
4:  2    1 b
5:  2    2 a
6:  2    3 b

In the table above, how could be done the following:

  1. if x=='a' at time==1 then x='a' at times 2 and 3, by id
  2. if x=='a' at time==2 then x='a' at time 3, by id

CodePudding user response:

Using the cumulative maximum function cummax:

dt[, x := cummax(x), by = id]
dt
# id time x
# 1:  1    1 1
# 2:  1    2 1
# 3:  1    3 1
# 4:  2    1 0
# 5:  2    2 1
# 6:  2    3 1
  • Related