I want to add a column col3
to my data frame df
with the binary outcome yes
or no
.
The issue is that the values in col3
should be conditioned on col1
and col2
in the sense that the outcome will be yes
if the value for col2
is also yes
for all instances of a unique value in col1
. In case one or more values are no
in col2
then the corresponding row in col3
should also be no
.
A simple example of the logic.
import pandas as pd
df={"col1": [1,1,1,2,3,3,4,4], "col2": ["yes","no","yes","no","yes","yes","yes","no"]}
df = pd.DataFrame(data=df)
col1 col2
0 1 yes
1 1 no
2 1 yes
3 2 no
4 3 yes
5 3 yes
6 4 yes
7 4 no
The desired outcome.
df_new
col1 col2 col3
0 1 yes no
1 1 no no
2 1 yes no
3 2 no no
4 3 yes yes
5 3 yes yes
6 4 yes no
7 4 no no
CodePudding user response:
You can use an ordered CategoricalDtype
to accomplish what you want:
cat = pd.CategoricalDtype(['yes', 'no'], ordered=True)
df['col3'] = df.astype({'col2': cat}).groupby('col1').transform('max').astype(str)
print(df)
# Output
col1 col2 col3
0 1 yes no
1 1 no no
2 1 yes no
3 2 no no
4 3 yes yes
5 3 yes yes
6 4 yes no
7 4 no no
The same using lexicographical order (yes < no):
df['col3'] = df.groupby('col1')['col2'].transform('min')
print(df)
# Output
col1 col2 col3
0 1 yes no
1 1 no no
2 1 yes no
3 2 no no
4 3 yes yes
5 3 yes yes
6 4 yes no
7 4 no no
The advantage of CategoricalDtype
is you have to be explicit in the order of data (The Zen of Python)
CodePudding user response:
Another possible solution:
df.assign(col3=df.groupby('col1')['col2'].transform(
lambda x: np.where(x.eq('yes').all(), 'yes', 'no')))
Output:
col1 col2 col3
0 1 yes no
1 1 no no
2 1 yes no
3 2 no no
4 3 yes yes
5 3 yes yes
6 4 yes no
7 4 no no