Home > OS >  Pandas filtering all rows where is a keyword could be in multiple columns
Pandas filtering all rows where is a keyword could be in multiple columns

Time:09-30

ev is a dataframe that contains all the shots from the 20-21 NHL season. I want to filter it down to the shots where a certain player was on the ice. That player's name can be in any of the 14 columns that I have listed. if I did something like dboss = ev[ev['home_on_1'].str.contains('DYLAN.LARKIN')] I can get all the rows where that name appears in that single column. I can also do something like dboss = ev[ev['home_on_1'].str.contains('DYLAN.LARKIN') | ev['home_on_2'].str.contains('DYLAN.LARKIN') | ...] but this is tedious as I have to do it for each column and does not seem very python. I tried the following code,

col_list = ['home_on_1', 'home_on_2', 'home_on_3', 'home_on_4',
            'home_on_5', 'home_on_6', 'home_on_7', 'away_on_1', 'away_on_2',
            'away_on_3', 'away_on_4', 'away_on_5', 'away_on_6', 'away_on_7',]
dboss = ev[ev[col_list].str.contains('DYLAN.LARKIN')]

This returns AttributeError: 'DataFrame' object has no attribute 'str' if I remove the .str, I just get AttributeError: 'DataFrame' object has no attribute 'contains' Thanks in advance for the help!

edit to add more information

CodePudding user response:

If I understand you correctly, you want to filter your dataframe to return only the rows that a player's name appears, in any of the columns. If that, you can utilise apply(), which is not efficient, and any(), which will search for any condition being True, as demonstrated below:

player_name = 'DYLAN.LARKIN'
new = df[df[col_list].apply(lambda i: i.str.contains(player_name).any(), axis=1)]
  • axis=1 refers to row-wise computations.

Printing back new:

      home_on_1     home_on_2 home_on_3  ... away_on_5     away_on_6     away_on_7
0  DYLAN.LARKIN          John      John  ...      John  DYLAN.LARKIN          John
1          John          John      John  ...      John          John          John
2          John          John      John  ...      John          John          John
3          John          John      John  ...      John          John          John
4          John  DYLAN.LARKIN      John  ...      John          John  DYLAN.LARKIN
5          John          John      John  ...      John          John          John
6          John          John      John  ...      John          John          John

[7 rows x 14 columns]

will return back only the rows that contain at least once the player name you declare.

Sample DF:

{'home_on_1': {0: 'DYLAN.LARKIN',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'John',
  5: 'John',
  6: 'John'},
 'home_on_2': {0: 'John',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'DYLAN.LARKIN',
  5: 'John',
  6: 'John'},
 'home_on_3': {0: 'John',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'John',
  5: 'John',
  6: 'John'},
 'home_on_4': {0: 'John',
  1: 'DYLAN.LARKIN',
  2: 'John',
  3: 'John',
  4: 'John',
  5: 'John',
  6: 'John'},
 'home_on_5': {0: 'John',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'John',
  5: 'John',
  6: 'DYLAN.LARKIN'},
 'home_on_6': {0: 'John',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'John',
  5: 'John',
  6: 'John'},
 'home_on_7': {0: 'DYLAN.LARKIN',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'John',
  5: 'John',
  6: 'John'},
 'away_on_1': {0: 'John',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'DYLAN.LARKIN',
  5: 'John',
  6: 'John'},
 'away_on_2': {0: 'John',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'John',
  5: 'John',
  6: 'John'},
 'away_on_3': {0: 'John',
  1: 'DYLAN.LARKIN',
  2: 'John',
  3: 'John',
  4: 'John',
  5: 'John',
  6: 'John'},
 'away_on_4': {0: 'John',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'John',
  5: 'John',
  6: 'DYLAN.LARKIN'},
 'away_on_5': {0: 'John',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'John',
  5: 'John',
  6: 'John'},
 'away_on_6': {0: 'DYLAN.LARKIN',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'John',
  5: 'John',
  6: 'John'},
 'away_on_7': {0: 'John',
  1: 'John',
  2: 'John',
  3: 'John',
  4: 'DYLAN.LARKIN',
  5: 'John',
  6: 'John'}}

CodePudding user response:

Depends how you want to approach.

Approach 1

df['filter_col'] = df.apply(lambda x: any(x == 'string to find'), axis=1)
df[df.filter_col == True]

Approach 2

filter_columns = []
df[df[filter_columns].where(lambda x: x != 'string to find').isnull().sum(axis=1) == 1]
  • Related