Add a new column with count depending on occurence within a list in pandas-CodePudding

I have a dataframse such as

Groups SP   Letters
G1     SP1  A
G1     SP1  Z
G1     SP1  A
G1     SP2  X
G1     SP3  X
G1     SP4  B
G2     SP2  A
G2     SP5  B
G2     SP3  X
G2     SP23 A

and I would like to add a column (Nb_column) which would be the count of unique SP value within each Groups but where Letters are in a list the_list=['A','B']

here I should then get:

Groups SP   Letters Nb_column
G1     SP1  A       2
G1     SP1  Z       2
G1     SP1  A       2
G1     SP2  X       2
G1     SP3  X       2
G1     SP4  B       2
G2     SP2  A       3
G2     SP5  B       3
G2     SP3  X       3
G2     SP23 A       3

Where

there are 2 SP (SP1,SP4) with a Letters in the_list for the G1. And
there are 3 SP (SP2,SP5 and SP23) with a Letters in the_list for the G2.

Does someone have an idea in pandas please ?

CodePudding user response：

You can use groupby transform nunique on a masked version of SP (by default the NaNs are not counted by nunique):

df['Nb_columns'] = (df['SP'].where(df['Letters'].isin(the_list))
                     .groupby(df['Groups']).transform('nunique')
                    )

output:

  Groups    SP Letters  Nb_columns
0     G1   SP1       A           2
1     G1   SP1       Z           2
2     G1   SP1       A           2
3     G1   SP2       X           2
4     G1   SP3       X           2
5     G1   SP4       B           2
6     G2   SP2       A           3
7     G2   SP5       B           3
8     G2   SP3       X           3
9     G2  SP23       A           3

CodePudding user response：

Use isin to check for the existence, then groupby().nunique() and map back:

nb = df[df['Letters'].isin(lst)].groupby('Groups')['SP'].nunique()

df['Nb_column'] = df['Groups'].map(nb)

CodePudding user response：

We can select from the SP column based on where Letters isin the_list using loc then use groupby nunique to get the corresponding number of unique values per group. Then rename and join back to the DataFrame to make the new column aligned with Groups:

df = df.join(
    df.loc[df['Letters'].isin(the_list), 'SP']
        .groupby(df['Groups']).nunique()
        .rename('Nb_column'),
    on='Groups'
)

Alternatively to join we can reindex to scale and create the new column from that:

df['Nb_column'] = (
    df.loc[df['Letters'].isin(the_list), 'SP']
        .groupby(df['Groups']).transform('nunique')
        .reindex(index=df.index, method='ffill')
)

df:

  Groups    SP Letters  Nb_column
0     G1   SP1       A          2
1     G1   SP1       Z          2
2     G1   SP1       A          2
3     G1   SP2       X          2
4     G1   SP3       X          2
5     G1   SP4       B          2
6     G2   SP2       A          3
7     G2   SP5       B          3
8     G2   SP3       X          3
9     G2  SP23       A          3