df = pd.DataFrame({'ID': ['A','A','A','A','A'],
'target': ['B','B','B','B','C'],
'length':[208,315,1987,3775,200],
'start':[139403,140668,141726,143705,108],
'end':[139609,140982,143711,147467,208]})
ID target length start end
0 A B 208 139403 139609
1 A B 315 140668 140982
2 A B 1987 141726 143711
3 A B 3775 143705 147467
4 A C 200 108 208
If I perform the operation:
(df.assign(length=
df['start'].lt(df['end'].shift())
.mul(df['start']-df['end'].shift(fill_value=0))
.add(df['length'])))
I get the correct result but how do I apply this logic to every group in a groupby?
for (a, b) in df.groupby(['start','end']):
(df.assign(length=
df['sstart'].lt(df['send'].shift())
.mul(df['sstart']-df['send'].shift(fill_value=0))
.add(df['length'])))
Leaves the dataframe unchanged?
CodePudding user response:
Group the df
on required columns(ID
and target
) and shift
the end
column then apply your formula as usual:
s = df.groupby(['ID', 'target'])['end'].shift()
df['length'] = df['start'].lt(s) * df['start'].sub(s.fillna(0)) df['length']
ID target length start end
0 A B 208.0 139403 139609
1 A B 315.0 140668 140982
2 A B 1987.0 141726 143711
3 A B 3769.0 143705 147467
4 A C 200.0 108 208