Home > Enterprise >  Creating a new column by applying conditions on the columns selected based on regular expression(as
Creating a new column by applying conditions on the columns selected based on regular expression(as

Time:01-27

I have a dataframe with columns as first ,col1,col2,col3,row1 import pandas as pd

df = pd.DataFrame({'first': ['1', '2', '3', '4', '5', '6', '7', '8'], 'col1': ['yes', 'yes', 'yes', 'yes', 'no', 'no', 'no', 'no'], 'col2': ['yes', 'yes', 'no', 'no' ,'yes', 'yes' ,'no', 'no'], 'col3': ['yes', 'no', 'yes', 'no','yes', 'no','yes', 'no'], 'row1':['4','5','7','3','7','3','79','8']})

first col1 col2 col3 row1 1 yes yes yes 4 2 yes yes no 5 3 yes no yes 7 4 yes no no 3 5 no yes yes 7 6 no yes no 3 7 no no yes 79 8 no no no 8

i want to apply conditions as below and create a new column as new_one

df.loc[((df['col1']=='no') & (df['col2']=='no') & (df['col3'] == 'no')),'new_one'] = 'no'
df.loc[~((df['col1']=='no') & (df['col2']=='no') & (df['col3'] == 'no')),'new_one'] = 'yes`

Problem: number of columns in the dataframe is dynamic and it is not constant as 3 in the above data frame but it matches with a pattern , here it is 'col*'

i want to select columns based up on the pattern(here col*) and apply the condition ('== 'no'') to each column and '&' them, and create a new column as new_one

i implemented the below code but not working

col_list=[]
for name in df.columns:
    if(re.search("col",name))
        col_list.append(name)
count=0
cond=''
for col in col_list:
   if(count==0):
      cond ="(df[\'" col "\'] == 'no')"
   else:
      cond ="&(df[\'" col "\'] == 'no')"
   count=count 1

df.loc[cond,'new_one'] = 'no'
df.loc[~cond,'new_one'] = 'no'

problem: cond is a str , i dont know how to use that as condition as df.loc expects cond to be series.i tried eval as well but no luck.please let me know any solution to this?

first col1 col2 col3 row1 new_one
1  yes  yes  yes    4     yes
2  yes  yes   no    5     yes
3  yes   no  yes    7     yes
4  yes   no   no    3     yes
5   no  yes  yes    7     yes
6   no  yes   no    3     yes
7   no   no  yes   79     yes
8   no   no   no    8      no

CodePudding user response:

Using lambda here you counting the number of time 'no' occurrence.

df['new'] = df[['col1','col2','col3']].T.apply(lambda r: 'no' if list(r).count('no')==3 else 'yes')

CodePudding user response:

Use all/any:

df.loc[df[['col1', 'col2', 'col3']].eq('no').all(axis=1), 'new_one'] = 'no'
df.loc[df[['col1', 'col2', 'col3']].ne('no').any(axis=1), 'new_one'] = 'yes'

Or, for both yes/no in a single command, numpy.where:

df['new_one'] = np.where(df[['col1', 'col2', 'col3']].eq('no').all(axis=1),
                         'no', 'yes')

Output:

  first col1 col2 col3 row1 new_one
0     1  yes  yes  yes    4     yes
1     2  yes  yes   no    5     yes
2     3  yes   no  yes    7     yes
3     4  yes   no   no    3     yes
4     5   no  yes  yes    7     yes
5     6   no  yes   no    3     yes
6     7   no   no  yes   79     yes
7     8   no   no   no    8      no

dynamic number of columns:

mask = df.filter(like='col').eq('no').all(axis=1)

df['new_one'] = np.where(mask, 'no', 'yes')
  • Related