I found a similar question here, but it did not help me because my case is different.
I have a huge dataframe that looks more or less like the example below:
x y st mt ast sr c7 z w
0 mt 2 1 4 2 2 a yes
1 b 3 3 3 3 3 yes
2 1 1 2 4 3 yes
3 d 3 3 1 2 4 d
4 e 2 3 2 1 4
5 nlp 5 5 5 5 5 f yes
my goal is to create another column named "other_flagged" based on the following condition:
'if columns x, y and z are NOT filled with a string, but w has a "yes", fill column "other_flagged" with the string 'flagged'. So, my desired output would be:
x y st mt ast sr c7 z w other flagged
0 mt 2 1 4 2 2 a yes
1 b 3 3 3 3 3 yes
2 1 1 2 4 3 yes flagged
3 d 3 3 1 2 4 d
4 e 2 3 2 1 4
5 nlp 5 5 5 5 5 f yes
in this case, only row 2 receives the string "flagged" because x, y and z are empty but column w has the string 'yes'. I tried the code below but the new column created is being completely populated with the string 'flagged' but not based on the conditions:
survey.loc[(survey['x'] != '') & (survey['y'] !='') & (survey['z'] != '') &
(survey['w'] == 'yes'), 'other_flagged'] ='flagged'
also tried:
survey.loc[(survey['x'] != '[a-zA-Z0-9-] ') & (survey['y'] != '[a-zA-Z0-9-] ') &
(survey['z'] != '[a-zA-Z0-9-] ') & (survey['w'] == 'yes'), 'other_flagged']='flagged'
How can I achieve this goal?
CodePudding user response:
Use if not exist values are empty strings is possible test all columns together use DataFrame.all
:
m = (survey[['x', 'y', 'z']] == '').all(axis=1) & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'
print (survey)
x y st mt ast sr c7 z w other_flagged
0 m t 2 1 4 2 2 a yes NaN
1 b 3 3 3 3 3 yes NaN
2 1 1 2 4 3 yes flagged
3 d 3 3 1 2 4 d NaN
4 e 2 3 2 1 4 NaN
5 nlp 5 5 5 5 5 f yes NaN
Your solution - change !=
to ==
:
m = (survey['x'] == '') & (survey['y'] =='') & (survey['z'] == '') & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'
If values are missing values:
m = survey[['x', 'y', 'z']].isna().all(1) & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'
print (survey)
x y st mt ast sr c7 z w other_flagged
0 m t 2 1 4 2 2 a yes NaN
1 NaN b 3 3 3 3 3 NaN yes NaN
2 NaN NaN 1 1 2 4 3 NaN yes flagged
3 NaN d 3 3 1 2 4 d NaN NaN
4 NaN e 2 3 2 1 4 NaN NaN NaN
5 nlp NaN 5 5 5 5 5 f yes NaN
CodePudding user response:
Use numpy.where
:
import numpy as np
survey['other flagged'] = np.where((survey['x'].eq('') & survey['y'].eq('') & survey['z'].eq('') & survey['w'].eq('yes')) , 'flagged', 'not flagged')