ev is a dataframe that contains all the shots from the 20-21 NHL season. I want to filter it down to the shots where a certain player was on the ice. That player's name can be in any of the 14 columns that I have listed. if I did something like dboss = ev[ev['home_on_1'].str.contains('DYLAN.LARKIN')]
I can get all the rows where that name appears in that single column. I can also do something like dboss = ev[ev['home_on_1'].str.contains('DYLAN.LARKIN') | ev['home_on_2'].str.contains('DYLAN.LARKIN') | ...]
but this is tedious as I have to do it for each column and does not seem very python. I tried the following code,
col_list = ['home_on_1', 'home_on_2', 'home_on_3', 'home_on_4',
'home_on_5', 'home_on_6', 'home_on_7', 'away_on_1', 'away_on_2',
'away_on_3', 'away_on_4', 'away_on_5', 'away_on_6', 'away_on_7',]
dboss = ev[ev[col_list].str.contains('DYLAN.LARKIN')]
This returns AttributeError: 'DataFrame' object has no attribute 'str'
if I remove the .str, I just get AttributeError: 'DataFrame' object has no attribute 'contains'
Thanks in advance for the help!
edit to add more information
CodePudding user response:
If I understand you correctly, you want to filter your dataframe to return only the rows that a player's name appears, in any of the columns. If that, you can utilise apply()
, which is not efficient, and any()
, which will search for any condition being True
, as demonstrated below:
player_name = 'DYLAN.LARKIN'
new = df[df[col_list].apply(lambda i: i.str.contains(player_name).any(), axis=1)]
axis=1
refers to row-wise computations.
Printing back new
:
home_on_1 home_on_2 home_on_3 ... away_on_5 away_on_6 away_on_7
0 DYLAN.LARKIN John John ... John DYLAN.LARKIN John
1 John John John ... John John John
2 John John John ... John John John
3 John John John ... John John John
4 John DYLAN.LARKIN John ... John John DYLAN.LARKIN
5 John John John ... John John John
6 John John John ... John John John
[7 rows x 14 columns]
will return back only the rows that contain at least once the player name you declare.
Sample DF
:
{'home_on_1': {0: 'DYLAN.LARKIN',
1: 'John',
2: 'John',
3: 'John',
4: 'John',
5: 'John',
6: 'John'},
'home_on_2': {0: 'John',
1: 'John',
2: 'John',
3: 'John',
4: 'DYLAN.LARKIN',
5: 'John',
6: 'John'},
'home_on_3': {0: 'John',
1: 'John',
2: 'John',
3: 'John',
4: 'John',
5: 'John',
6: 'John'},
'home_on_4': {0: 'John',
1: 'DYLAN.LARKIN',
2: 'John',
3: 'John',
4: 'John',
5: 'John',
6: 'John'},
'home_on_5': {0: 'John',
1: 'John',
2: 'John',
3: 'John',
4: 'John',
5: 'John',
6: 'DYLAN.LARKIN'},
'home_on_6': {0: 'John',
1: 'John',
2: 'John',
3: 'John',
4: 'John',
5: 'John',
6: 'John'},
'home_on_7': {0: 'DYLAN.LARKIN',
1: 'John',
2: 'John',
3: 'John',
4: 'John',
5: 'John',
6: 'John'},
'away_on_1': {0: 'John',
1: 'John',
2: 'John',
3: 'John',
4: 'DYLAN.LARKIN',
5: 'John',
6: 'John'},
'away_on_2': {0: 'John',
1: 'John',
2: 'John',
3: 'John',
4: 'John',
5: 'John',
6: 'John'},
'away_on_3': {0: 'John',
1: 'DYLAN.LARKIN',
2: 'John',
3: 'John',
4: 'John',
5: 'John',
6: 'John'},
'away_on_4': {0: 'John',
1: 'John',
2: 'John',
3: 'John',
4: 'John',
5: 'John',
6: 'DYLAN.LARKIN'},
'away_on_5': {0: 'John',
1: 'John',
2: 'John',
3: 'John',
4: 'John',
5: 'John',
6: 'John'},
'away_on_6': {0: 'DYLAN.LARKIN',
1: 'John',
2: 'John',
3: 'John',
4: 'John',
5: 'John',
6: 'John'},
'away_on_7': {0: 'John',
1: 'John',
2: 'John',
3: 'John',
4: 'DYLAN.LARKIN',
5: 'John',
6: 'John'}}
CodePudding user response:
Depends how you want to approach.
Approach 1
df['filter_col'] = df.apply(lambda x: any(x == 'string to find'), axis=1)
df[df.filter_col == True]
Approach 2
filter_columns = []
df[df[filter_columns].where(lambda x: x != 'string to find').isnull().sum(axis=1) == 1]