Home > OS >  How to select rows based on conditions applied to the columns and use that to create another column
How to select rows based on conditions applied to the columns and use that to create another column

Time:08-11

I have my input data as below:

import pandas as pd
df = pd.DataFrame({
    "Pillar": ["block", "segment", "block", "block", "segment",
              "segment", "block",  "block", "block"],
    "Pillar ID": [1, 1, 1, 2, 2, 3, 2, 1, 1],
    "Distance" : [1.5, 3, 4, 5, 7, 7.8, 9, 3, 6]
})
 df
#   Pillar    Pillar ID     Distance
# 0 Block       1             1.5
# 1 Segment     1             3
# 2 Block       1             4
# 3 Block       2             5
# 4 Segment     2             7 
# 5 Segment     3             7.8 
# 6 Block       2             9
# 7 Block       1             3
# 8 Block       1             6

I want an output to get all the "Block" with two consecutive equal "Pillar ID" and assign "start" to the block with the lesser distance among the two consecutive blocks with same ID and assign "stop" to the block with the greater distance among the two consecutive blocks with same ID to have an output as below:

#   Pillar    Pillar ID     Distance   Position
# 0 Block       1             1.5      Start
# 2 Block       1             4        Stop
# 3 Block       2             5        Start 
# 6 Block       2             9        Stop
# 7 Block       1             3        Start
# 8 Block       1             6        Stop

CodePudding user response:

Check Below code:

df.sort_values(['Pillar'], inplace=True)

df['Pillar ID_Shift'] =  (df['Pillar ID'] != df.groupby(['Pillar'])['Pillar ID'].shift()).cumsum()
df['Position'] = np.where(df['Distance'] == df.groupby(['Pillar','Pillar ID_Shift'])['Distance'].transform('min'), 'Start', 'Stop')
df[df['Pillar'] == 'block'][['Pillar', 'Pillar ID', 'Distance', 'Position']]

Output:

enter image description here

  • Related