I have a dataframse such as
Groups SP Letters
G1 SP1 A
G1 SP1 Z
G1 SP1 A
G1 SP2 X
G1 SP3 X
G1 SP4 B
G2 SP2 A
G2 SP5 B
G2 SP3 X
G2 SP23 A
and I would like to add a column (Nb_column
) which would be the count of unique SP
value within each Groups
but where Letters
are in a list the_list=['A','B']
here I should then get:
Groups SP Letters Nb_column
G1 SP1 A 2
G1 SP1 Z 2
G1 SP1 A 2
G1 SP2 X 2
G1 SP3 X 2
G1 SP4 B 2
G2 SP2 A 3
G2 SP5 B 3
G2 SP3 X 3
G2 SP23 A 3
Where
- there are 2
SP
(SP1,SP4) with aLetters
inthe_list
for the G1. And - there are 3
SP
(SP2,SP5 and SP23) with aLetters
inthe_list
for the G2.
Does someone have an idea in pandas please ?
CodePudding user response:
You can use groupby
transform
nunique
on a masked version of SP (by default the NaNs are not counted by nunique
):
df['Nb_columns'] = (df['SP'].where(df['Letters'].isin(the_list))
.groupby(df['Groups']).transform('nunique')
)
output:
Groups SP Letters Nb_columns
0 G1 SP1 A 2
1 G1 SP1 Z 2
2 G1 SP1 A 2
3 G1 SP2 X 2
4 G1 SP3 X 2
5 G1 SP4 B 2
6 G2 SP2 A 3
7 G2 SP5 B 3
8 G2 SP3 X 3
9 G2 SP23 A 3
CodePudding user response:
Use isin
to check for the existence, then groupby().nunique()
and map back:
nb = df[df['Letters'].isin(lst)].groupby('Groups')['SP'].nunique()
df['Nb_column'] = df['Groups'].map(nb)
CodePudding user response:
We can select from the SP
column based on where Letters
isin
the_list
using loc
then use groupby nunique
to get the corresponding number of unique values per group. Then rename
and join
back to the DataFrame to make the new column aligned with Groups
:
df = df.join(
df.loc[df['Letters'].isin(the_list), 'SP']
.groupby(df['Groups']).nunique()
.rename('Nb_column'),
on='Groups'
)
Alternatively to join
we can reindex
to scale and create the new column from that:
df['Nb_column'] = (
df.loc[df['Letters'].isin(the_list), 'SP']
.groupby(df['Groups']).transform('nunique')
.reindex(index=df.index, method='ffill')
)
df
:
Groups SP Letters Nb_column
0 G1 SP1 A 2
1 G1 SP1 Z 2
2 G1 SP1 A 2
3 G1 SP2 X 2
4 G1 SP3 X 2
5 G1 SP4 B 2
6 G2 SP2 A 3
7 G2 SP5 B 3
8 G2 SP3 X 3
9 G2 SP23 A 3