I am trying to clean a dataframe and the code works on the entire dtaframe however when I am testing the snippet, I am getting an error:
My dataframe:
df:
Unnamed: 0 game score home_odds draw_odds away_odds country league datetime home_team away_team home_score away_score
1114366 1114366 Estrella - Britannia 1:7 2.67 3.87 2.14 Aruba Division di Honor 2021-11-01 00:00:00 Estrella Britannia 1 7
1114367 1114367 Aucas - LDU Quito 1:0 2.75 3.44 2.36 Ecuador Liga Pro 2021-11-01 00:00:00 Aucas LDU Quito 1 0
1114368 1114368 Ocotal - Juventus Managua 1:0 2.49 3.12 2.6 Nicaragua Liga Primera 2021-11-01 00:00:00 Ocotal Juventus Managua 1 0
1114369 1114369 Jalapa - Real Madriz 0:1 2.29 3.15 2.82 Nicaragua Liga Primera 2021-11-01 00:00:00 Jalapa Real Madriz 0 1
1114370 1114370 Sporting San Jose - Grecia 2:1 2.28 3.29 2.94 Costa Rica Primera Division 2021-11-01 01:00:00 Sporting San Jose
The I am getting an error when I run this:
m = df[['home_odds', 'draw_odds', 'away_odds']].agg(lambda x: x.str.count('/'), 1).ne(0).all(1)
What action does the function perform?
Error:
File "C:/Users/harsh/AppData/Roaming/JetBrains/PyCharmCE2021.2/scratches/scratch_2.py", line 35, in <module>
m = df[['home_odds', 'draw_odds', 'away_odds']].agg(lambda x: x.str.count('/'), 1).ne(0).all(1)
File "C:\Users\harsh\OneDrive\Documents\VENV\lib\site-packages\pandas\core\frame.py", line 8551, in aggregate
result = op.agg()
File "C:\Users\harsh\OneDrive\Documents\VENV\lib\site-packages\pandas\core\apply.py", line 715, in agg
result = self.obj.apply(self.orig_f, axis, args=self.args, **self.kwargs)
File "C:\Users\harsh\OneDrive\Documents\VENV\lib\site-packages\pandas\core\frame.py", line 8741, in apply
return op.apply()
File "C:\Users\harsh\OneDrive\Documents\VENV\lib\site-packages\pandas\core\apply.py", line 688, in apply
return self.apply_standard()
File "C:\Users\harsh\OneDrive\Documents\VENV\lib\site-packages\pandas\core\apply.py", line 812, in apply_standard
results, res_index = self.apply_series_generator()
File "C:\Users\harsh\OneDrive\Documents\VENV\lib\site-packages\pandas\core\apply.py", line 828, in apply_series_generator
results[i] = self.f(v)
File "C:/Users/harsh/AppData/Roaming/JetBrains/PyCharmCE2021.2/scratches/scratch_2.py", line 35, in <lambda>
m = df[['home_odds', 'draw_odds', 'away_odds']].agg(lambda x: x.str.count('/'), 1).ne(0).all(1)
File "C:\Users\harsh\OneDrive\Documents\VENV\lib\site-packages\pandas\core\generic.py", line 5487, in __getattr__
return object.__getattribute__(self, name)
File "C:\Users\harsh\OneDrive\Documents\VENV\lib\site-packages\pandas\core\accessor.py", line 181, in __get__
accessor_obj = self._accessor(obj)
File "C:\Users\harsh\OneDrive\Documents\VENV\lib\site-packages\pandas\core\strings\accessor.py", line 168, in __init__
self._inferred_dtype = self._validate(data)
File "C:\Users\harsh\OneDrive\Documents\VENV\lib\site-packages\pandas\core\strings\accessor.py", line 225, in _validate
raise AttributeError("Can only use .str accessor with string values!")
AttributeError: Can only use .str accessor with string values!
CodePudding user response:
Error is expected, because function count /
so need strings in columns 'home_odds', 'draw_odds', 'away_odds'
, but there are numbers, so raise error.
All together:
df[['home_odds', 'draw_odds', 'away_odds']].agg(lambda x: x.str.count('/'), 1).ne(0).all(1)
x.str.count('/') # count `/`
.agg(lambda x: x.str.count('/'), 1) # count per rows in specified columns
.ne(0) #test if not equal 0, means no / in specified columns
.all(1) # test if all values are Trues per rows, same like .all(axis=1)
If possible mixed strings with numbers add astype(str)
, if return all True
s it means there is no values with /
:
m = df[['home_odds', 'draw_odds', 'away_odds']].astype(str).agg(lambda x: x.str.count('/'), 1).ne(0).all(1)
Test if some False
s - get rows with at least one / per specified columns:
print (df.loc[~m, ['home_odds', 'draw_odds', 'away_odds']])