Assuming I have the dataset
ID cond press metrics
0 h1 new 30 1.2
1 h2 old 30 1.0
2 h3 new 50 1.5
3 h4 old 50 1.4
4 h5 new 70 1.8
I want to add another column called ratios
, which is the ratio between the same ID
& press
with cond
old/new. If pairs not found, fill the value with zero.
The final results will be:
ID cond press metrics ratios
0 h1 new 30 1.2 1.0/1.2
1 h2 old 30 1.0 0
2 h3 new 50 1.5 1.4/1.5
3 h4 old 50 1.4 0
4 h5 new 70 1.8 0
CodePudding user response:
ratios = (df.groupby('press')
.apply(lambda x: (x.loc[x.cond.eq('old'), 'metrics'].squeeze()
/x.loc[x.cond.eq('new'), 'metrics'].squeeze())
if set(x.cond) == {'new', 'old'} else 0))
ratios.name = 'ratios'
df.loc[df.cond.eq('new'), 'ratios'] = df[df.cond.eq('new')].merge(ratios, left_on='press', right_index=True)
df.ratios = df.ratios.fillna(0)
Output:
ID cond press metrics ratios
0 h1 new 30 1.2 0.833333
1 h2 old 30 1.0 0.000000
2 h3 new 50 1.5 0.933333
3 h4 old 50 1.4 0.000000
4 h5 new 70 1.8 0.000000
CodePudding user response:
Try:
old = df[df.cond.eq('old')].reset_index(drop=True).copy()
new = df[df.cond.eq('new')].reset_index(drop=True).copy()
new['metrics'] = pd.merge_asof(old, new, on='press', tolerance=0).eval('metrics_x / metrics_y')
new.fillna(0, inplace=True)
old['metrics'] = 0
final = pd.concat([new, old]).sort_values('ID').reset_index(drop=True)
Which gives:
>>> final
ID cond press metrics
0 h1 new 30 0.833333
1 h2 old 30 0.000000
2 h3 new 50 0.933333
3 h4 old 50 0.000000
4 h5 new 70 0.000000
It's a few lines but easily could be a function. Will leave that to you!