Create pandas subtraction column based on one other column in two conditions-CodePudding

I have a dataframe with subjects in two different conditions and many value columns.

d = {
     "subject": [1, 1, 2, 2], 
     "condition": ["on", "off", "on", "off"], 
     "value": [1, 2, 3, 5]
     }
df = pd.DataFrame(data=d)
df

	subject	condition	value
0	1	on	1
1	1	off	2
2	2	on	3
3	2	off	5

I would like to get new columns which indicate the difference off-on between both conditions. In this case I would like to get:

	subject	condition	value	off-on
0	1	on	1	1
1	1	off	2	1
2	2	on	3	2
3	2	off	5	2

How would I best do that?

I could achieve the result using this code:

onoff = (df[df.condition == "off"].value.reset_index() - df[df.condition == "on"].value.reset_index()).value
for idx, sub in enumerate(df.subject.unique()):
    df.loc[df.subject == sub, "off-on"] = onoff.iloc[idx]

But it seems quite tedious and slow. I was hoping for a solution without loop. I have many rows and very many value columns. Is there a better way?

CodePudding user response：

Use a pivot combined with map:

df['off-on'] = df['subject'].map(
 df.pivot(index='subject', columns='condition', values='value')
   .eval('off-on')
)

Or with a MultiIndex (more efficient than a pivot):

s = df.set_index(['condition', 'subject'])['value']
df['off-on'] = df['subject'].map(s['off']-s['on'])

Output:

   subject condition  value  off-on
0        1        on      1       1
1        1       off      2       1
2        2        on      3       2
3        2       off      5       2

timings

On 100k subjects

# MultiIndexing
43.2 ms ± 2.46 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# pivot
77 ms ± 12.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

CodePudding user response：

Use DataFrame.pivot for possible easy mapping subtracted column off and on by Series.map:

df1 = df.pivot(index='subject', columns='condition', values='value')

df['off-on'] = df['subject'].map(df1['off'].sub(df1['on']))
print (df)
   subject condition  value  off-on
0        1        on      1       1
1        1       off      2       1
2        2        on      3       2
3        2       off      5       2

Details:

print (df.pivot(index='subject', columns='condition', values='value'))
condition  off  on
subject           
1            2   1
2            5   3

print (df1['off'].sub(df1['on']))
subject
1    1
2    2
dtype: int64