I am trying to identify Ethereum addresses in my dataframe that are not only buyers but also seller. The dataframe consists of transactions with that have the following variables:
hash, seller, buyer, ethamount, buyerid, date, price, dollvalue
I want to remove all rows that have a buyer or seller address that is unique in the dataset, meaning it is only involved in one transaction.I tried this in excel but it is far too slow since my original dataset has more than three million rows.
head(df, 50)
looks like this after I removed the other variables for simplification but I am not sure whether there is a purchase sale pair included, hence the screenshot in which the red marked address ending on 9a36 would be an example of an address I want to keep.
seller buyer buyerid date
1 0x63a9975ba31b0b9626b34300f7f627147df1f526 0xc1f1fadf3bdc72a0efba48f4e7d78be6d4528528 984508 2016-03-10
2 0x63a9975ba31b0b9626b34300f7f627147df1f526 0x1f57f826caf594f7a837d9fc092456870a289365 162018 2016-03-10
3 0x63a9975ba31b0b9626b34300f7f627147df1f526 0x1f57f826caf594f7a837d9fc092456870a289365 162018 2016-03-10
4 0x63a9975ba31b0b9626b34300f7f627147df1f526 0x1f57f826caf594f7a837d9fc092456870a289365 162018 2016-03-10
5 0x00cb2f4cc15e422561c908f9c507800352334115 0x964936ac5a3bbcc8e5d4a896dfa7cc884d8b2c10 764345 2016-03-10
6 0x63a9975ba31b0b9626b34300f7f627147df1f526 0xe8108287b6c088da9779442fa9b7b777b02fb2f7 1177049 2016-03-10
7 0x28ef2ca10d09496bd909fa6121b5046db9a645f2 0x32be343b94f860124dc4fee278fdcbd38c102d88 259878 2016-03-10
8 0x21eab61ee90ab1a79ca1a2fe783b5d14a9a35402 0x32be343b94f860124dc4fee278fdcbd38c102d88 259878 2016-03-10
9 0x63a9975ba31b0b9626b34300f7f627147df1f526 0x1f57f826caf594f7a837d9fc092456870a289365 162018 2016-03-10
10 0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5 0xed11ceb51449d2d7cd8fa170330a714a3982cef4 1202248 2016-03-10
11 0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5 0xe9996c54c9700ce56096ab9e0dce14cc5a7bc513 1184782 2016-03-10
12 0x8fa4bd105740d9bd158b4a278128612d74a29c4f 0x32be343b94f860124dc4fee278fdcbd38c102d88 259878 2016-03-10
13 0xf8b483dba2c3b7176a3da549ad41a48bb3121069 0x9adc56eb1cd505ced003b040b357a5fea2b542de 787441 2016-03-10
14 0x63a9975ba31b0b9626b34300f7f627147df1f526 0x1f57f826caf594f7a837d9fc092456870a289365 162018 2016-03-10
15 0x63a9975ba31b0b9626b34300f7f627147df1f526 0xa7a7a7a7abf34a075ba0bebec3925ada35ecd90e 852055 2016-03-10
16 0x2a65aca4d5fc5b5c859090a6c34d164135398226 0xd614cc8e7d44e6e5d48b9b3efd5ffec36098f403 1086441 2016-03-10
17 0xa09fc8ed5cbe8871eb8bb410f487758c9004a6ef 0x32be343b94f860124dc4fee278fdcbd38c102d88 259878 2016-03-10
18 0x102aabe751de044642ebaf25aca682edb0d0f849 0x32be343b94f860124dc4fee278fdcbd38c102d88 259878 2016-03-10
19 0x68795c4aa09d6f4ed3e5deddf8c2ad3049a601da 0x0983bafc02494b6dc26bc5256e55d7e8b6c0d8b5 51233 2016-03-10
20 0xdc49aea495151789cb80a131fb5985456b4db003 0x120a270bbc009644e35f0bb6ab13f95b8199c4ad 94649 2016-03-10
21 0x120a270bbc009644e35f0bb6ab13f95b8199c4ad 0x7f42b732d80ec4442be958d793114f4660ce3f46 647398 2016-03-10
22 0x151255dd9e38e44db38ea06ec66d0d113d6cbe37 0x049dea544deba7045f44371e8441c340e3696046 26162 2016-03-10
23 0x2a65aca4d5fc5b5c859090a6c34d164135398226 0xa29862fb7f9b37374d0c9062ab52bdd74d1af867 826512 2016-03-10
24 0x0c729be7c39543c3d549282a40395299d987cec2 0x37085faf0bc61e425da19eb1e0b3e89c0767bd98 281690 2016-03-10
25 0x0c729be7c39543c3d549282a40395299d987cec2 0xbd0a30fe58ab04c3b5a61e098a34625c600a6fd8 959872 2016-03-10
26 0x0c729be7c39543c3d549282a40395299d987cec2 0x14217bbd1ebc23d138fcd08fa2cd6cf9c0b08d53 105174 2016-03-10
27 0x0c729be7c39543c3d549282a40395299d987cec2 0x24a724d8c8d3655e6dbb3bb7f14a5be2a232f54d 188932 2016-03-10
28 0x4bb96091ee9d802ed039c4d1a5f6216f90f81b01 0x31f09bf32a5f2bfbb5babb6a8eb5f4717628e756 255912 2016-03-10
29 0x2f6579f840068b7dc4b32d37ddeb07925a49bb45 0x120a270bbc009644e35f0bb6ab13f95b8199c4ad 94649 2016-03-11
30 0x2a65aca4d5fc5b5c859090a6c34d164135398226 0xba501ea5e8ba6db530d7a5a07715318c58222d40 946273 2016-03-11
31 0x04a57de2ddc48f3102905cf86cb55d0b1f071c71 0x32be343b94f860124dc4fee278fdcbd38c102d88 259878 2016-03-11
32 0x2a65aca4d5fc5b5c859090a6c34d164135398226 0xe2f355fe1c681ffda642003386c86408997bde7b 1151362 2016-03-11
33 0x2910543af39aba0cd09dbb2d50200b3e800a63d2 0x24a9b889b8a702cbf610bbc0486a31c3b90066c5 188994 2016-03-11
34 0xa338ba84d5d4734988a5ed5cabc55ed61371e198 0x32be343b94f860124dc4fee278fdcbd38c102d88 259878 2016-03-11
35 0x4537040f3399d9741e1dcf307cffbdd8ad11ef2d 0x32be343b94f860124dc4fee278fdcbd38c102d88 259878 2016-03-11
36 0x0a5348bb95a09a12b9275df3f900e5bb3e490297 0x32be343b94f860124dc4fee278fdcbd38c102d88 259878 2016-03-11
37 0x52bc44d5378309ee2abf1539bf71de1b7d7be3b5 0x45101ccadc7ab3bd8de13db6a03b413e1971daf3 352857 2016-03-11
38 0x32be343b94f860124dc4fee278fdcbd38c102d88 0xac211d37b17f2adfcb93b6e042215b300e157e4f 874878 2016-03-11
39 0x2a65aca4d5fc5b5c859090a6c34d164135398226 0xe14584cfbf45aa3dcc1fdf5b2124ed05639e2eae 1143175 2016-03-11
40 0x2a65aca4d5fc5b5c859090a6c34d164135398226 0x83d471a8b398c8b97b094a7f362fbe5e571ef689 670548 2016-03-11
41 0x2a65aca4d5fc5b5c859090a6c34d164135398226 0x46d9c9937ac3376c13c7ae082c8255a82efdc9d7 361879 2016-03-11
42 0x32be343b94f860124dc4fee278fdcbd38c102d88 0x426488248cb5a2a0f358d9bb557269f0be121f74 339340 2016-03-11
43 0x2a65aca4d5fc5b5c859090a6c34d164135398226 0xaeb5241b6dbf14492fa155c87dc5c42238ec1e26 887693 2016-03-11
44 0x4d67f4f0db45b3b5a3424b39ed8d972f710381d5 0x32be343b94f860124dc4fee278fdcbd38c102d88 259878 2016-03-11
45 0xefaa23397c85df68ff1708b489856b006b30eed2 0x32be343b94f860124dc4fee278fdcbd38c102d88 259878 2016-03-11
46 0xf8b483dba2c3b7176a3da549ad41a48bb3121069 0xa28e16bf181b9114e8d2bebc500f5fddeb2174e1 826311 2016-03-11
47 0xe85eed429429188f5909f60c12c002567f5aeeb1 0x120a270bbc009644e35f0bb6ab13f95b8199c4ad 94649 2016-03-11
48 0x3ffdbe21f386d543a8e4151550da4107d697f352 0x120a270bbc009644e35f0bb6ab13f95b8199c4ad 94649 2016-03-11
49 0xf8b483dba2c3b7176a3da549ad41a48bb3121069 0xb3a0f076c5c07f41dcb2025b92c34f68165fa57d 912478 2016-03-11
50 0x38c6916869813de3cecc0d58ae64e563420d52f3 0x120a270bbc009644e35f0bb6ab13f95b8199c4ad 94649 2016-03-11
I hope this i somewhat clear, I could not find any other questoions related to this, anything else was related to single columns but nothing regarding duplicates over two columns. Any type of help is appreciated, thank you very much in advance.
CodePudding user response:
You could create a vector keep
that holds the ids of sellers/buyers that occur more than once; then use that to filter the rows of the table
library(data.table)
keep = data.table(id = c(df$seller, df$buyer))[,.N,id][N>1,id]
setDT(df)[seller %chin% keep | buyer %chin% keep]
CodePudding user response:
How about this:
uniqueSellers = unique(df['seller'])
uniqueBuyers = unique(df['buyers'])
BuyersAndSellers = unique(uniqueSellers uniqueBuyers)
Then you iterate through the rows and keep only those rows where either the seller or the buyer (or both) are in the BuyersAndSellers
list. Hope this helps :)