Home > Blockchain >  data.frame test between lines
data.frame test between lines

Time:10-08

I have this following data.frame

df=data.frame(Abs=c("At1", "At1", "At1", "At2", "At2", "At3", "At4", "At4", "At4", "At4", "At5", "At5", "At6", "At6", "At6"),
              vid=c("id1", "id2", "id3", "id4", "id5", "id1", "id1", "id2", "id5", "id6", "id7", "id8", "id8", "id9", "id12"),
              ert=c(1,1,2,7,8,4,5,5,5,9,6,5,2,2,2),
               Tre= c(1256, 1260, 854, 147,8962, 87452, 1452, 1500, 1502, 147, 210, 258,1400,1850,1865))

> df
   Abs  vid ert   Tre
1  At1  id1   1  1256
2  At1  id2   1  1260
3  At1  id3   2   854
4  At2  id4   7   147
5  At2  id5   8  8962
6  At3  id1   4 87452
7  At4  id1   5  1452
8  At4  id2   5  1500
9  At4  id5   5  1502
10 At4  id6   9   147
11 At5  id7   6   210
12 At5  id8   5   258
13 At6  id8   2  1400
14 At6  id9   2  1850
15 At6 id12   2  1865

I would like to subset some lines based on several conditions comparing different line on the dataframe. The rule to follow is : keep lines that have a same Abs a different vid a same ert an absolute difference in Tre < 300.

Following this rule I would like to obtain this second dataframe

> df2
  Abs  vid ert  Tre
1 At1  id1   1 1256
2 At1  id2   1 1260
3 At4  id1   5 1452
4 At4  id2   5 1500
5 At4  id5   5 1502
6 At6  id9   2 1850
7 At6 id12   2 1865

My issue is to test on lines. Usually I test value for a line between differents variable but don't know how to apply test to compare between lines without using multiple forloop.

CodePudding user response:

Here's a dplyr approach that should work fine for smallish data. If your data is large (my intuition being on the order of >100,000 rows), it will start to make sense to convert to a data.table or sqldf solution. This is a question that can be solved more efficiently with a "non-equi join", which can't currently be done in dplyr, so I'm using a join filter workaround.

library(dplyr)
df_row <- df %>% mutate(row = row_number())
df_row %>%
  left_join(df_row, by = c("Abs", "ert")) %>%
  filter(vid.x != vid.y, abs(Tre.x - Tre.y) < 300) %>%
  distinct(Abs, vid = vid.x, ert, Tre = Tre.x)

Result

  Abs ert  vid  Tre
1 At1   1  id1 1256
2 At1   1  id2 1260
3 At4   5  id1 1452
4 At4   5  id2 1500
5 At4   5  id5 1502
6 At6   2  id9 1850
7 At6   2 id12 1865
  • Related