I have asked a very similar question here, unfortunately, the toy problem is too simple for my case.
I have a dataframe with subjects in two different conditions and many value columns for various channels.
d = {
"subject": [1, 1, 2, 2, 3, 3],
"condition": ["on", "off", "on", "off", "on", "off"],
"channel": [1, 1, 1, 1, 1, 2]
"value": [1, 2, 3, 5, 4, 6]
}
df = pd.DataFrame(data=d)
df
subject | condition | channel | value | |
---|---|---|---|---|
0 | 1 | on | 1 | 1 |
1 | 1 | off | 1 | 2 |
2 | 2 | on | 1 | 3 |
3 | 2 | off | 1 | 6 |
4 | 3 | on | 1 | 4 |
5 | 3 | off | 2 | 6 |
I would like to get new columns which indicate the difference off-on between both conditions for each channel. If one channel is missing in one condition, I would like to get nan. In this case I would like to get:
subject | condition | channel | value | value_off-on | |
---|---|---|---|---|---|
0 | 1 | on | 1 | 1 | 1 |
1 | 1 | off | 1 | 2 | 1 |
2 | 2 | on | 1 | 3 | 3 |
3 | 2 | off | 1 | 6 | 3 |
4 | 3 | on | 1 | 4 | nan |
5 | 3 | off | 2 | 6 | nan |
How would I best do that?
I have to tried to extend the solutions here, however, this throws errors:
df['off-on'] = df.set_index(['subject', 'channel']).map(
df.pivot(index=['subject', 'channel'], columns='condition', values='value')
.eval('off-on')
)
AttributeError: 'DataFrame' object has no attribute 'map'
s = df.set_index(['condition', 'subject', 'channel'])['value']
df['off-on'] = df[['subject', 'channel']].map(s['off']-s['on'])
AttributeError: 'DataFrame' object has no attribute 'map'
What's the solution?
CodePudding user response:
You can use:
df['value_off-on'] = (df.groupby(['subject', 'channel'])['value']
.transform(lambda x: x.diff().iloc[-1]))
print(df)
# Output
subject condition channel value value_off-on
0 1 on 1 1 1.0
1 1 off 1 2 1.0
2 2 on 1 3 3.0
3 2 off 1 6 3.0
4 3 on 1 4 NaN
5 3 off 2 6 NaN
Note: If your dataframe is not already sorted by condition
, you have to sort it before with df.sort_values('condition', ascending=False).groupby(...)
CodePudding user response:
Fix your existing solution with MultiIndex.map
s = df.pivot(index=['subject', 'channel'], columns='condition', values='value').eval('off-on')
df['off-on'] = df.set_index(['subject', 'channel']).index.map(s)
Alternatively you can use merge
:
s = df.pivot(index=['subject', 'channel'], columns='condition', values='value').eval('off-on')
df.merge(s.reset_index(name='off-on'))
Result
subject condition channel value off-on
0 1 on 1 1 1.0
1 1 off 1 2 1.0
2 2 on 1 3 3.0
3 2 off 1 6 3.0
4 3 on 1 4 NaN
5 3 off 2 6 NaN
CodePudding user response:
Modification of my previous answer to work with multiple columns:
s = df.set_index(['condition', 'subject', 'channel'])['value']
# instead of map, use a merge
df['off-on'] = (df[['subject', 'channel']]
.merge(s['off'].sub(s['on']).reset_index(), how='left')
)['value']
# or indexing
df['off-on'] = df.set_index(['subject', 'channel']).index.map(s['off'].sub(s['on']))
Output:
subject condition channel value off-on
0 1 on 1 1 1.0
1 1 off 1 2 1.0
2 2 on 1 3 2.0
3 2 off 1 5 2.0
4 3 on 1 4 NaN
5 3 off 2 6 NaN