Add another column based on the value of two columns-CodePudding

I am trying to add another column based on the value of two columns. Here is the mini version of my dataframe.

data = {'current_pair': ['"["StimusNeu/2357.jpg","StimusNeu/5731.jpg"]"', '"["StimusEmo/6350.jpg","StimusEmo/3230.jpg"]"', '"["StimusEmo/3215.jpg","StimusEmo/9570.jpg"]"','"["StimusNeu/7020.jpg","StimusNeu/7547.jpg"]"', '"["StimusNeu/7080.jpg","StimusNeu/7179.jpg"]"'],
        'B': [1, 0, 1, 1, 0]
        }
df = pd.DataFrame(data)
df

                                    current_pair    B
0   "["StimusNeu/2357.jpg","StimusNeu/5731.jpg"]"   1
1   "["StimusEmo/6350.jpg","StimusEmo/3230.jpg"]"   0
2   "["StimusEmo/3215.jpg","StimusEmo/9570.jpg"]"   1
3   "["StimusNeu/7020.jpg","StimusNeu/7547.jpg"]"   1
4   "["StimusNeu/7080.jpg","StimusNeu/7179.jpg"]"   0

I want the result to be:

                                    current_pair    B   C
0   "["StimusNeu/2357.jpg","StimusNeu/5731.jpg"]"   1   1
1   "["StimusEmo/6350.jpg","StimusEmo/3230.jpg"]"   0   2
2   "["StimusEmo/3215.jpg","StimusEmo/9570.jpg"]"   1   0
3   "["StimusNeu/7020.jpg","StimusNeu/7547.jpg"]"   1   1
4   "["StimusNeu/7080.jpg","StimusNeu/7179.jpg"]"   0   2

I used the numpy select commands:

conditions=[(data['B']==1 & data['current_pair'].str.contains('Emo/', na=False)),
            (data['B']==1 & data['current_pair'].str.contains('Neu/', na=False)),
            data['B']==0]
choices = [0, 1, 2]
data['C'] = np.select(conditions, choices, default=np.nan)

Unfortunately, it gives me this dataframe without recognizing anything with "1" in column "C".

                                    current_pair    B   C
0   "["StimusNeu/2357.jpg","StimusNeu/5731.jpg"]"   1   0
1   "["StimusEmo/6350.jpg","StimusEmo/3230.jpg"]"   0   2
2   "["StimusEmo/3215.jpg","StimusEmo/9570.jpg"]"   1   0
3   "["StimusNeu/7020.jpg","StimusNeu/7547.jpg"]"   1   0
4   "["StimusNeu/7080.jpg","StimusNeu/7179.jpg"]"   0   2

Any help counts! thanks a lot.

CodePudding user response：

I think some logic went wrong here; this works:

df.assign(C=np.select([df.B==0, df.current_pair.str.contains('Emo/'), df.current_pair.str.contains('Neu/')], [2,0,1]))

CodePudding user response：

There is problem with () after ==1 for precedence of operators:

conditions=[(data['B']==1) & data['current_pair'].str.contains('Emo/', na=False),
            (data['B']==1) & data['current_pair'].str.contains('Neu/', na=False),
             data['B']==0]

CodePudding user response：

Here is a slightly more generalized suggestion, easily applicable to more complex cases. You should, however mind execution speed:

import pandas as pd
df = pd.DataFrame({'col_1': ['Abc', 'Xcd', 'Afs', 'Xtf', 'Aky'], 'col_2': [1, 2, 3, 4, 5]})
def someLogic(col_1, col_2):
    if 'A' in col_1 and col_2 == 1:
        return 111
    elif "X" in col_1 and col_2 == 4:
        return 999
    return 888
df['NewCol'] = df.apply(lambda row: someLogic(row.col_1, row.col_2), axis=1, result_type="expand")
print(df)