I have a dataframe which looks something like this:
dfA
name group country registration
X engg Thailand True
A engg Peru True
B engg Nan False
H IT Nan False
J IT India False
K Food Nan True
Z Food Nan False
I want to add two new columns based on the grouping of the group column but considering the values of the country and registration column. The new dataframe should look like this:
dfB
name group country registration value_country value registration
X engg Thailand True True True
A engg Peru True True True
B engg Nan False True True
H IT Nan False True False
J IT India False True False
K Food Nan True False True
Z Food Nan False False True
The value_country column is formed by grouping the "group" column agaisnt country to check for every group is there is even a single country value we assign the complete group value to be True and similarly for value_registartion in the "group" column if any group has a single True value the entire group has the value True else False. How do I do this?
I can use the pandas.groupby() funtion for this but how do I apply a condition for checking values in other columns as one is string column (country) the other is a boolean column(registration)?
CodePudding user response:
Use GroupBy.transform
with GroupBy.any
for test if at least one non missing values in country
for value_country
and for test if at least one True
in registration
for value registration
column:
df = df.replace('Nan', np.nan)
df['value_country'] = df['country'].notna().groupby(df['group']).transform('any')
df['value registration'] = df.groupby('group')['registration'].transform('any')
print (df)
name group country registration value_country value registration
0 X engg Thailand True True True
1 A engg Peru True True True
2 B engg NaN False True True
3 H IT NaN False True False
4 J IT India False True False
5 K Food NaN True False True
6 Z Food NaN False False True
Both together:
df[['value_country', 'value registration']] = (df.assign(new = df['country'].notna())
.groupby('group')[['new','registration']]
.transform('any'))