Home > Net >  Create pandas subtraction column based on one other column and multiindex
Create pandas subtraction column based on one other column and multiindex

Time:02-01

I have asked a very similar question here, unfortunately, the toy problem is too simple for my case.

I have a dataframe with subjects in two different conditions and many value columns for various channels.

d = {
     "subject": [1, 1, 2, 2, 3, 3], 
     "condition": ["on", "off", "on", "off", "on", "off"], 
     "channel": [1, 1, 1, 1, 1, 2]
     "value": [1, 2, 3, 5, 4, 6]
     }
df = pd.DataFrame(data=d)
df
subject condition channel value
0 1 on 1 1
1 1 off 1 2
2 2 on 1 3
3 2 off 1 6
4 3 on 1 4
5 3 off 2 6

I would like to get new columns which indicate the difference off-on between both conditions for each channel. If one channel is missing in one condition, I would like to get nan. In this case I would like to get:

subject condition channel value value_off-on
0 1 on 1 1 1
1 1 off 1 2 1
2 2 on 1 3 3
3 2 off 1 6 3
4 3 on 1 4 nan
5 3 off 2 6 nan

How would I best do that?

I have to tried to extend the solutions here, however, this throws errors:

df['off-on'] = df.set_index(['subject', 'channel']).map(
 df.pivot(index=['subject', 'channel'], columns='condition', values='value')
   .eval('off-on')
)
AttributeError: 'DataFrame' object has no attribute 'map'
s = df.set_index(['condition', 'subject', 'channel'])['value']
df['off-on'] = df[['subject', 'channel']].map(s['off']-s['on'])
AttributeError: 'DataFrame' object has no attribute 'map'

What's the solution?

CodePudding user response:

You can use:

df['value_off-on'] = (df.groupby(['subject', 'channel'])['value']
                        .transform(lambda x: x.diff().iloc[-1]))
print(df)

# Output
   subject condition  channel  value  value_off-on
0        1        on        1      1           1.0
1        1       off        1      2           1.0
2        2        on        1      3           3.0
3        2       off        1      6           3.0
4        3        on        1      4           NaN
5        3       off        2      6           NaN

Note: If your dataframe is not already sorted by condition, you have to sort it before with df.sort_values('condition', ascending=False).groupby(...)

CodePudding user response:

Fix your existing solution with MultiIndex.map

s = df.pivot(index=['subject', 'channel'], columns='condition', values='value').eval('off-on')
df['off-on'] = df.set_index(['subject', 'channel']).index.map(s)

Alternatively you can use merge:

s = df.pivot(index=['subject', 'channel'], columns='condition', values='value').eval('off-on')
df.merge(s.reset_index(name='off-on'))

Result

   subject condition  channel  value  off-on
0        1        on        1      1     1.0
1        1       off        1      2     1.0
2        2        on        1      3     3.0
3        2       off        1      6     3.0
4        3        on        1      4     NaN
5        3       off        2      6     NaN

CodePudding user response:

Modification of my previous answer to work with multiple columns:

s = df.set_index(['condition', 'subject', 'channel'])['value']

# instead of map, use a merge
df['off-on'] = (df[['subject', 'channel']]
                .merge(s['off'].sub(s['on']).reset_index(), how='left')
               )['value']

# or indexing
df['off-on'] = df.set_index(['subject', 'channel']).index.map(s['off'].sub(s['on']))

Output:

   subject condition  channel  value  off-on
0        1        on        1      1     1.0
1        1       off        1      2     1.0
2        2        on        1      3     2.0
3        2       off        1      5     2.0
4        3        on        1      4     NaN
5        3       off        2      6     NaN
  • Related