Home > Back-end >  New variable in pandas conditioned on two variables where one variable transcend multiple rows
New variable in pandas conditioned on two variables where one variable transcend multiple rows

Time:01-19

I want to add a column col3 to my data frame df with the binary outcome yes or no .

The issue is that the values in col3 should be conditioned on col1 and col2 in the sense that the outcome will be yes if the value for col2 is also yes for all instances of a unique value in col1. In case one or more values are no in col2 then the corresponding row in col3 should also be no.

A simple example of the logic.

import pandas as pd
df={"col1": [1,1,1,2,3,3,4,4], "col2": ["yes","no","yes","no","yes","yes","yes","no"]}
df = pd.DataFrame(data=df)
   col1 col2
0     1  yes
1     1   no
2     1  yes
3     2   no
4     3  yes
5     3  yes
6     4  yes
7     4   no

The desired outcome.

df_new
   col1 col2 col3
0     1  yes   no
1     1   no   no
2     1  yes   no
3     2   no   no
4     3  yes  yes
5     3  yes  yes
6     4  yes   no
7     4   no   no

CodePudding user response:

You can use an ordered CategoricalDtype to accomplish what you want:

cat = pd.CategoricalDtype(['yes', 'no'], ordered=True)
df['col3'] =  df.astype({'col2': cat}).groupby('col1').transform('max').astype(str)
print(df)

# Output
   col1 col2 col3
0     1  yes   no
1     1   no   no
2     1  yes   no
3     2   no   no
4     3  yes  yes
5     3  yes  yes
6     4  yes   no
7     4   no   no

The same using lexicographical order (yes < no):

df['col3'] = df.groupby('col1')['col2'].transform('min')
print(df)

# Output
   col1 col2 col3
0     1  yes   no
1     1   no   no
2     1  yes   no
3     2   no   no
4     3  yes  yes
5     3  yes  yes
6     4  yes   no
7     4   no   no

The advantage of CategoricalDtype is you have to be explicit in the order of data (The Zen of Python)

CodePudding user response:

Another possible solution:

df.assign(col3=df.groupby('col1')['col2'].transform(
    lambda x: np.where(x.eq('yes').all(), 'yes', 'no')))

Output:

   col1 col2 col3
0     1  yes   no
1     1   no   no
2     1  yes   no
3     2   no   no
4     3  yes  yes
5     3  yes  yes
6     4  yes   no
7     4   no   no
  • Related