Filtering exact match and concatenating in a new column in pandas-CodePudding

I have df as

acc_id  acc_status  username    TeamMemRole             email
A1      Inactive    A           Serv Primary            [email protected]
A1      Inactive    B           Sales - Primary         [email protected]
A1      Inactive    C           Ent Specialist          [email protected]
A2      Inactive    D           Sales - Primary         [email protected]
B1      Inactive    E           Tech Sales - Primary    [email protected]
B1      Inactive    F           Ent Specialist          [email protected]
B1      Inactive    G           Sales - Primary         [email protected]
C1      Inactive    H           Sales - Primary         [email protected]
C1      Inactive    I           Sales - Secondary       [email protected]
C1      Inactive    J           Tech Sales - Primary    [email protected]

I want all the the emails expect Sales - Primary for that particular acc_id to be concatenated and be in a separate column and the emails of only Sales - Primary in a separate column as below

acc_id  acc_status    username     TeamMemRole            email       To         CC
A1      Inactive      A            Serv Primary           [email protected]    [email protected]  [email protected],[email protected]
A1      Inactive      B            Sales - Primary        [email protected]      
A1      Inactive      C            Ent Specialist         [email protected]      
A2      Inactive      D            Sales - Primary        [email protected]    [email protected]  
B1      Inactive      E            Tech Sales - Primary   [email protected]    [email protected]  [email protected],[email protected] 
B1      Inactive      F            Ent Specialist         [email protected]      
B1      Inactive      G            Sales - Primary        [email protected]      
C1      Inactive      H            Sales - Primary        [email protected]    [email protected]  [email protected],[email protected]
C1      Inactive      I            Sales - Secondary      [email protected]      
C1      Inactive      J            Tech Sales - Primary   [email protected]

I tried using

df[df['TeamMemRole'].str.contains('Sales - Primary')==False]

but this is removing Tech Sales - Primary as well.

Please help with any suggestions and approaches

CodePudding user response：

Let's try:

# first row for each group
first_rows = ~df['acc_id'].duplicated()

# primary sale rows
prim_sale_rows = df['TeamMemRole'].eq('Sales - Primary')

# To column
prim_sales = df.loc[prim_sale_rows].groupby('acc_id')['email'].apply(', '.join)
df.loc[first_rows, 'To'] = df['acc_id'].map(prim_sales)

# CC column
CC_list = df.loc[~prim_sale_rows].groupby('acc_id')['email'].apply(', '.join)
df.loc[first_rows, 'CC'] = df['acc_id'].map(CC_list)

Output:

  acc_id acc_status username           TeamMemRole     email        To                  CC
0     A1   Inactive        A          Serv Primary  [email protected]  [email protected]  [email protected], [email protected]
1     A1   Inactive        B       Sales - Primary  [email protected]       NaN                 NaN
2     A1   Inactive        C        Ent Specialist  [email protected]       NaN                 NaN
3     A2   Inactive        D       Sales - Primary  [email protected]  [email protected]                 NaN
4     B1   Inactive        E  Tech Sales - Primary  [email protected]  [email protected]  [email protected], [email protected]
5     B1   Inactive        F        Ent Specialist  [email protected]       NaN                 NaN
6     B1   Inactive        G       Sales - Primary  [email protected]       NaN                 NaN
7     C1   Inactive        H       Sales - Primary  [email protected]  [email protected]  [email protected], [email protected]
8     C1   Inactive        I     Sales - Secondary  [email protected]       NaN                 NaN
9     C1   Inactive        J  Tech Sales - Primary  [email protected]       NaN                 NaN