I have a dataframe such as :
The_list=["A","B"]
Groups COL2
G1 A
G1 B
G1 C
G2 A
G2 D
G3 A
G3 B
G4 A
G5 B
and I would like to add another COL3
column where I put yes for each Groups
where all element from The_list are present within the COL2
, and no if not.
I should then get :
Groups COL2 COL3
G1 A yes
G1 B yes
G1 C yes
G2 A no
G2 D no
G3 A yes
G3 B yes
G4 A no
G5 B no
CodePudding user response:
Use Groupby.apply
with set intersection
and np.where
:
# Group on column 'Groups' and apply set intersection and find length of set
In [1178]: out = df.groupby('Groups')['COL2'].apply(lambda x: len(set(x) & set(The_list))).reset_index()
# If length of set is equal to the length of the list, then 'yes' else 'no'
In [1179]: import numpy as np
In [1180]: out['COL3'] = np.where(out.COL2.eq(len(The_list), 'yes', 'no')
# Merge original df with above and populate COL3
In [1185]: df = df.merge(out[['Groups','COL3']])
In [1186]: df
Out[1186]:
Groups COL2 COL3
0 G1 A yes
1 G1 B yes
2 G1 C yes
3 G2 A no
4 G2 D no
5 G3 A yes
6 G3 B yes
7 G4 A no
8 G5 B no
CodePudding user response:
First make sure that Groups is the index. If not:
df = df.set_index("Groups")
Initialize COL3:
df["COL3"] = "yes"
Then make a list of the COL2 values for each group and change COL3 to "no" if an element from the list is missing:
for group in set(df.index):
COL2_values = list(df.loc[group, "COL2"])
for element in The_list:
if element not in COL2_values:
df.loc[group, "COL3"] = 'no'
break
CodePudding user response:
import pandas as pd
df = pd.read_clipboard()
df
'''
Groups COL2
0 G1 A
1 G1 B
2 G1 C
3 G2 A
4 G2 D
5 G3 A
6 G3 B
7 G4 A
8 G5 B
'''
The_list=["A","B"]
tag = df.groupby('Groups').apply(lambda x: all([i in x.COL2.to_list() for i in The_list]))
tag
'''
Groups
G1 True
G2 False
G3 True
G4 False
G5 False
dtype: bool
'''
df.assign(COL3=df.Groups.map(tag).map({True:'yes', False:'no'}))
'''
Groups COL2 COL3
0 G1 A yes
1 G1 B yes
2 G1 C yes
3 G2 A no
4 G2 D no
5 G3 A yes
6 G3 B yes
7 G4 A no
8 G5 B no
'''