Home > Mobile >  R data.table only perform operation on group if condition is met
R data.table only perform operation on group if condition is met

Time:10-29

I have a data.table that is supposed to remove all rows per group until a negative number is met in value (including the row with the negative number itself). However, if there is no negative number in value I would like to keep all rows from that group.

# Example data    
group = rep(1:4,each=3)
value = c(1,2,3,1,-2,3,1,2,-3,-1,2,3)
DT = data.table(group,value)

> DT
    group  value row_idx
 1:     1      1       1
 2:     1      2       2
 3:     1      3       3
 4:     2      1       1
 5:     2     -2       2
 6:     2      3       3
 7:     3      1       1
 8:     3      2       2
 9:     3     -3       3
10:     4     -1       1
11:     4      2       2
12:     4      3       3

My attempt so far:

DT[,row_idx := seq_len(.N), by = "group"] #append row index per group
DT[,.SD[row_idx > (which(sign(value) == -1))], by = "group"] 

   group  value row_idx
1:     2      3       3
2:     4      2       2
3:     4      3       3

In this example group 1 is being deleted although I would like to keep it as no negative number is present in this group. I can check for the presence/absence of negative signs in value by DT[,(-1) %in% sign(value), by = "group"] but I do not know how to use this to achieve what I want.

CodePudding user response:

We may use a if/else condition

library(data.table)
DT[DT[, if(any(sign(value) < 0)) 
    .I[row_idx > (which(sign(value) == -1))] else .I, by = group]$V1]

-output

   group value row_idx
   <int> <num>   <int>
1:     1     1       1
2:     1     2       2
3:     1     3       3
4:     2     3       3
5:     4     2       2
6:     4     3       3

Or slightly more compact option

DT[DT[, .I[seq_len(.N) > match(-1, sign(value), nomatch = 0)], group]$V1]
   group value
   <int> <num>
1:     1     1
2:     1     2
3:     1     3
4:     2     3
5:     4     2
6:     4     3

CodePudding user response:

DT[, .SD[if (min(value) > 0) TRUE else -(1:which.max(value < 0))], by = group]
#    group value
#    <int> <num>
# 1:     1     1
# 2:     1     2
# 3:     1     3
# 4:     2     3
# 5:     4     2
# 6:     4     3
  • Related