Home > OS >  Extracting information from a data frame if certain columns of the row are in a pre-defined list
Extracting information from a data frame if certain columns of the row are in a pre-defined list

Time:10-22

I have a list like this:

list1 = ['4361', '1856', '57586', '79017', '972', '974', '1829', '10787', '85477', '57019', '7431', '53616', '26228', '29085', '5217', '5527']

And then I have two columns of a data frame like this:

print(df['col A'][0:10])
0      6416
1     84665
2        90
3      2624
4      6118
5       375
6       377
7       377
9       351
10      333


print(df['col B'][0:10])
0      2318
1        88
2      2339
3      5371
4      6774
5     23163
6     23647
7     27236
9     10513
10     1600

I want to say 'return only the rows in the data frame, if an item in the list is either in col A or col B of the data frame'.

I could imagine how to do this iteratively, something like this:

for each_item in list1:
    for i,row in df.iterrows():
         if each_item in row['col A']:
               print(row)
         if each_item in row['col B']:
               print (row)

I'm just wondering if there's a neater way to do it where I don't have to continually loop through the dataframe, as both the list and the dataframe are quite big.

I saw this code snippet online, where this would return the rows where df['col A'] equals a value OR df['col B'] equals a value:

print(df[(df["col A"]==1) | (df_train["col A"]==2)]

I'm just unsure how to convert this to pulling out the data if it's in a list. Can someone show me how to perhaps incorporate this kind of idea into my code, or do people think my original code snippet (using .iterrows()) is the best way?

CodePudding user response:

Use isin:

print(df[(df['col A'].isin(list1)) | (df['col B'].isin(list1))])
  • Related