To explain my problem easier I have created a dataset:
data = {'Cycle': ['Set1', 'Set1', 'Set1', 'Set2', 'Set2', 'Set2', 'Set2'],
'Value': [1, 2.2, .5, .2,1,2.5,1]}
I want to create a loop that goes through the "Cycle" column and marks the max of each cycle with the letter A and the min with letter B, to output something like this:
POI = {'Cycle': ['Set1', 'Set1', 'Set1', 'Set2', 'Set2', 'Set2', 'Set2'],
'Value': [1, 2.2, .5, .2,1,2.5,1],
'POI': [0, 'A','B','B',0,'A',0]}
df2 = pd.DataFrame(POI)
I am new to Python, so as much detail as possible would be very helpful, as well as I am not exactly sure how to go through each cycle on its own to get these values, so explaining that would be great.
Thanks
CodePudding user response:
Using numpy.select
and groupby.transform
:
g = df.groupby('Cycle')['Value']
df['POI'] = np.select([df['Value'].eq(g.transform('min')),
df['Value'].eq(g.transform('max'))],
['A', 'B'])
# if you want 0 as default value (not '0')
df['POI'] = df['POI'].replace('0', 0)
output:
Cycle Value POI
0 Set1 1.0 0
1 Set1 2.2 B
2 Set1 0.5 A
3 Set2 0.2 A
4 Set2 1.0 0
5 Set2 2.5 B
6 Set2 1.0 0