Home > Software engineering >  How to use for loop and if conditions together in data.table?
How to use for loop and if conditions together in data.table?

Time:02-23

I have a data.table dt in R like this:

        ukey      time    orderType      a       edge        h1        h2
 1: 10600877   93003000       Buy       14.12      100      0.002      0.0002    
 2: 10600188   93003000       Buy       24.29      100      0.003      0.0003 
 3: 10600995   93003000       Buy       17.23      100      0.003      0.0003 
 4: 10600262   93003000      Sell       18.89      100      0.002      0.0004 
 5: 10600399   93003000      Sell       18.66      100      0.004      0.0002 
 6: 10603260   93003000      Sell       110.39     100      0.003      0.0002 
 7: 11000004   93003000      Sell       22.98      100      0.002      0.0003 
 ...

I want to get a new column called check by some conditions. My sample code looks like this:

for (i in 1:nrow(dt)) {
    if (dt[i, a - 1.5 * h1 > 111]) {
        dt[, check:= "A"]
        break
    }
    else if (dt[i, a  edge - 1.3 * h1- 0.3 * h2> 111]) {
        dt[, check:= "B"]
        break
    }
    else if (dt[i, a- 1.5 * h1> 80]) {
        dt[, check:= "C"]
        break
    }
    else {dt[, check:= "D"]}
}

However, my code doesn't work well and it is inefficient. Could you give me some better solutions?Many thanks!

CodePudding user response:

You can try to use case_when in dplyr package.

dt[, check := case_when((a - 1.5 * h1) > 111 ~ "A",
                        (a  edge - 1.3 * h1- 0.3 * h2)> 111 ~ "B",
                        (a- 1.5 * h1) > 80 ~ "C",
                        TRUE ~ "D")]

Result looks like :

> dt
       ukey     time orderType      a edge    h1    h2 check
1: 10600877 93003000       Buy  14.12  100 0.002 2e-04     B
2: 10600188 93003000       Buy  24.29  100 0.003 3e-04     B
3: 10600995 93003000       Buy  17.23  100 0.003 3e-04     B
4: 10600262 93003000      Sell  18.89  100 0.002 4e-04     B
5: 10600399 93003000      Sell  18.66  100 0.004 2e-04     B
6: 10603260 93003000      Sell 110.39  100 0.003 2e-04     B
7: 11000004 93003000      Sell  22.98  100 0.002 3e-04     B

Hope it is useful.

CodePudding user response:

What you are looking for is a faster ifelse equivalent

library(data.table)

setDT(dt)

# initialise
dt[, check := "D"]

# faster ifelse
dt[a- 1.5 * h1> 80, check := "C"
   ][a  edge - 1.3 * h1- 0.3 * h2> 111, check := "B"
     ][a - 1.5 * h1 > 111, check := "A"
       ]

Notice the latter subset conditions take precedence. Thus, orders are reversed i.e. C, B, A. Anything not captured by our conditions are left with initialised value i.e. D

Similar post & performance gain vs dplyr

  • Related