Home > front end >  created a new column in a pandas dataframe based on multiple different conditions
created a new column in a pandas dataframe based on multiple different conditions

Time:05-20

I found a similar question here, but it did not help me because my case is different.

I have a huge dataframe that looks more or less like the example below:

        x   y   st  mt  ast sr  c7  z   w
    0   mt      2   1   4   2   2   a   yes
    1       b   3   3   3   3   3       yes
    2           1   1   2   4   3       yes
    3       d   3   3   1   2   4   d   
    4       e   2   3   2   1   4       
    5   nlp     5   5   5   5   5   f   yes

my goal is to create another column named "other_flagged" based on the following condition:

'if columns x, y and z are NOT filled with a string, but w has a "yes", fill column "other_flagged" with the string 'flagged'. So, my desired output would be:

        x   y   st  mt  ast sr  c7  z   w   other flagged
    0   mt      2   1   4   2   2   a   yes     
    1       b   3   3   3   3   3       yes    
    2           1   1   2   4   3       yes     flagged
    3       d   3   3   1   2   4   d   
    4       e   2   3   2   1   4       
    5   nlp     5   5   5   5   5   f   yes

in this case, only row 2 receives the string "flagged" because x, y and z are empty but column w has the string 'yes'. I tried the code below but the new column created is being completely populated with the string 'flagged' but not based on the conditions:

 survey.loc[(survey['x'] != '') & (survey['y'] !='') & (survey['z'] != '') & 
(survey['w'] == 'yes'), 'other_flagged'] ='flagged'

also tried:

survey.loc[(survey['x'] != '[a-zA-Z0-9-] ') & (survey['y'] != '[a-zA-Z0-9-] ') & 
(survey['z'] != '[a-zA-Z0-9-] ') & (survey['w'] == 'yes'), 'other_flagged']='flagged'

How can I achieve this goal?

CodePudding user response:

Use if not exist values are empty strings is possible test all columns together use DataFrame.all:

m = (survey[['x', 'y', 'z']] == '').all(axis=1) & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'

print (survey)
     x  y  st  mt  ast  sr  c7  z    w other_flagged
0    m  t   2   1    4   2   2  a  yes           NaN
1       b   3   3    3   3   3     yes           NaN
2           1   1    2   4   3     yes       flagged
3       d   3   3    1   2   4  d                NaN
4       e   2   3    2   1   4                   NaN
5  nlp      5   5    5   5   5  f  yes           NaN

Your solution - change != to ==:

m = (survey['x'] == '') & (survey['y'] =='') & (survey['z'] == '') & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'

If values are missing values:

m = survey[['x', 'y', 'z']].isna().all(1) & (survey['w'] == 'yes')
survey.loc[m, 'other_flagged'] ='flagged'

print (survey)
     x    y  st  mt  ast  sr  c7    z    w other_flagged
0    m    t   2   1    4   2   2    a  yes           NaN
1  NaN    b   3   3    3   3   3  NaN  yes           NaN
2  NaN  NaN   1   1    2   4   3  NaN  yes       flagged
3  NaN    d   3   3    1   2   4    d  NaN           NaN
4  NaN    e   2   3    2   1   4  NaN  NaN           NaN
5  nlp  NaN   5   5    5   5   5    f  yes           NaN

CodePudding user response:

Use numpy.where:

import numpy as np

survey['other flagged'] = np.where((survey['x'].eq('') & survey['y'].eq('') & survey['z'].eq('') & survey['w'].eq('yes')) , 'flagged', 'not flagged')
  • Related