How to extract rows where a series of objects are either of the two columns?-CodePudding

I have the following df. The number of observations run upto hundreds of millions. I have a series that consists of a large number (around 15 million) of addresses. I want to extract rows if either from or to address can be found in the series.

    from_address to_address value   block_timestamp
0   0xcf1cc6s   0x272d9fa   1.01485 2018-06-03 09:13:22 00:00
1   0xbfb92c3   0x24d941b   3.097   2018-06-03 09:13:22 00:00
2   0x58076fd   0x5b89cb8   1.9089  2018-06-03 09:13:22 00:00
3   0x0000000   0xbe80455   7.71195 2018-06-03 09:13:22 00:00
4   0x0000000   0x334f5b0   7.71133 2018-06-03 09:13:22 00:00
5   0x0000000   0x9b54db8   1e 21   2018-06-03 09:13:22 00:00
6   0x84b06bd   0x2a0c0db   2.185   2018-06-03 10:55:46 00:00
7   0x59a5202   0x3122da2   1633277 2018-06-03 10:55:46 00:00
8   0x59a520a   0x02d9112   1.606   2018-06-03 10:55:46 00:00
9   0x5e57d43   0xd64d144   1.472   2018-06-03 01:02:14 00:00

My code:

series= list['address'].str.lower()

df_extracted = df[df['from_address'].isin(series) 
                                            | df['to_address'].isin(series)]

But somehow, when I look at the original df and look at all the rows which contains a specific address, i get 136 rows, whereas if i do the same with df_extracted, i only get 12. So when i am extracting rows using the above code, I am doing something wrong. Does anyone have any clue what I am doing wrong?

CodePudding user response：

This worked for me:

addresses = ['0x0000000', '0x2a0c0db']
df.query('from_address in @addresses or to_address in @addresses')

It's supposed to work the same way as your code so maybe check your dataframe just before you apply this operation.

The result I'm getting:


        from_add     to_add    value            block_timestamp
index                                                           
3      0x0000000  0xbe80455  7.71195  2018-06-03 09:13:22 00:00
4      0x0000000  0x334f5b0  7.71133  2018-06-03 09:13:22 00:00
5      0x0000000  0x9b54db8    1e 21  2018-06-03 09:13:22 00:00
6      0x84b06bd  0x2a0c0db    2.185  2018-06-03 10:55:46 00:00