I have a dataframe such as :
Groups Name Value
G1 BLOC_Homo_sapiens 100
G1 BLOC_Chimpenzee 99
G1 BLOC_Bonobo 80
G1 Canis_lupus 20
G1 Danio_rerio 10
G2 BLOC_Homo_sapiens 30
G2 BLOC_Bonobo 29
G2 Mus_musculus 28
G2 Cules_pupiens 26
G3 BLOC_Gorrilla 300
G3 Cimex_lectularius 10
G3 Bombus_terrestris 9
And I would like to add a new column called "diff_length
" for each Groups
where I substract the highest Value
of the Name
containing the pattern "BLOC
" against the highest Value of the Name
which does not contain the pattern "BLOC
".
For the Groups1 for instance, the highest Value
with the BLOC is 100, and the highest Value
without BLOC is 20. So the result is 100-20 = 80
.
I should then get :
Groups Name Value diff_length
G1 BLOC_Homo_sapiens 100 80
G1 BLOC_Chimpenzee 99 80
G1 BLOC_Bonobo 80 80
G1 Canis_lupus 20 80
G1 Danio_rerio 10 80
G2 BLOC_Homo_sapiens 30 2
G2 BLOC_Bonobo 29 2
G2 Mus_musculus 28 2
G2 Cules_pupiens 26 2
G3 BLOC_Gorrilla 300 290
G3 Cimex_lectularius 10 290
G3 Bombus_terrestris 9 290
CodePudding user response:
You can use:
m = df['Name'].str.contains('BLOC')
df['diff_length'] = (df.groupby('Groups')['Value']
.transform(lambda d: d.where(m).max() - d.mask(m).max())
)
NB. this assumes unique indices.
Output:
Groups Name Value diff_length
0 G1 BLOC_Homo_sapiens 100 80.0
1 G1 BLOC_Chimpenzee 99 80.0
2 G1 BLOC_Bonobo 80 80.0
3 G1 Canis_lupus 20 80.0
4 G1 Danio_rerio 10 80.0
5 G2 BLOC_Homo_sapiens 30 2.0
6 G2 BLOC_Bonobo 29 2.0
7 G2 Mus_musculus 28 2.0
8 G2 Cules_pupiens 26 2.0
9 G3 BLOC_Gorrilla 300 290.0
10 G3 Cimex_lectularius 10 290.0
11 G3 Bombus_terrestris 9 290.0
Alternative syntax:
m = df['Name'].str.contains('BLOC')
df['diff_length'] = (
df['Value'].where(m).groupby(df['Groups']).transform('max')
-df['Value'].mask(m).groupby(df['Groups']).transform('max')
)
CodePudding user response:
here is one way to do it
# identify rows that contains the block
m1=df['Name'].str.contains('BLOC')
# groupby on Groups and the rows that has BLOC and ones that don't
# take the max for each (TRUE/FALSE) in a group, and take diff
df2=df.groupby(['Groups', m1 ] )['Value'].max().diff().reset_index()
# create a dictionary
d=dict(df2[df2['Name'].eq(True)][['Groups','Value']].values)
# map difference back to the df
df['diff_length'] = df['Groups'].map(d)
df
Groups Name Value diff_length
0 G1 BLOC_Homo_sapiens 100 80.0
1 G1 BLOC_Chimpenzee 99 80.0
2 G1 BLOC_Bonobo 80 80.0
3 G1 Canis_lupus 20 80.0
4 G1 Danio_rerio 10 80.0
5 G2 BLOC_Homo_sapiens 30 2.0
6 G2 BLOC_Bonobo 29 2.0
7 G2 Mus_musculus 28 2.0
8 G2 Cules_pupiens 26 2.0
9 G3 BLOC_Gorrilla 300 290.0
10 G3 Cimex_lectularius 10 290.0
11 G3 Bombus_terrestris 9 290.0