Home > Enterprise >  Grouping a column based on values on other columns to create new columns in pandas
Grouping a column based on values on other columns to create new columns in pandas

Time:09-28

I have a dataframe which looks something like this:

dfA
name      group      country       registration
X         engg       Thailand      True
A         engg       Peru          True
B         engg       Nan           False
H         IT         Nan           False
J         IT         India         False
K         Food       Nan           True
Z         Food       Nan           False

I want to add two new columns based on the grouping of the group column but considering the values of the country and registration column. The new dataframe should look like this:

dfB
name      group      country       registration      value_country       value registration
X         engg       Thailand      True                  True                 True
A         engg       Peru          True                  True                 True
B         engg       Nan           False                 True                 True
H         IT         Nan           False                 True                 False
J         IT         India         False                 True                 False
K         Food       Nan           True                  False                True
Z         Food       Nan           False                 False                True

The value_country column is formed by grouping the "group" column agaisnt country to check for every group is there is even a single country value we assign the complete group value to be True and similarly for value_registartion in the "group" column if any group has a single True value the entire group has the value True else False. How do I do this?

I can use the pandas.groupby() funtion for this but how do I apply a condition for checking values in other columns as one is string column (country) the other is a boolean column(registration)?

CodePudding user response:

Use GroupBy.transform with GroupBy.any for test if at least one non missing values in country for value_country and for test if at least one True in registration for value registration column:

df = df.replace('Nan', np.nan)

df['value_country'] = df['country'].notna().groupby(df['group']).transform('any')
df['value registration'] = df.groupby('group')['registration'].transform('any')
print (df)
  name group   country  registration  value_country  value registration
0    X  engg  Thailand          True           True                True
1    A  engg      Peru          True           True                True
2    B  engg       NaN         False           True                True
3    H    IT       NaN         False           True               False
4    J    IT     India         False           True               False
5    K  Food       NaN          True          False                True
6    Z  Food       NaN         False          False                True

Both together:

df[['value_country', 'value registration']] = (df.assign(new = df['country'].notna())
                                                 .groupby('group')[['new','registration']]
                                                 .transform('any'))
  • Related