Home > Net >  How to summarize the data to get separate column for each rank in pandas?
How to summarize the data to get separate column for each rank in pandas?

Time:09-07

Table:

Gate    Status   BUS    Time
Gate1   Vacant   102    10:00
Gate2   Occupied 105    10:01
Gate3   Vacant   114    10:00
Gate4   Occupied 123    9:55
Gate5   Occupied 127    10:00
Gate6   Occupied 126    10:01
Gate7   Occupied 106    10:01
Gate1   Vacant   101    10:15
Gate2   Occupied 113    10:02
Gate3   Vacant   116    10:01
Gate4   Occupied 124    10:00
Gate5   Occupied 129    10:00
Gate7   Occupied 125    10:20
Gate1   Vacant   103    10:25
Gate2   Occupied 112    10:15
Gate3   Vacant   120    10:30
Gate4   Occupied  99    10:15
Gate5   Occupied 131    10:05
Gate1   Vacant   107    10:29
Gate2   Occupied 115    10:30
Gate3   Vacant   125    10:31
Gate5   Occupied 130    10:15
Gate1   Vacant   111    10:30
Gate2   Occupied 121    10:30
Gate1   Vacant   104    10:32
Gate2   Occupied 122    10:31

Desired Result:

Gate    Status   Rank#1 Rank#2  Rank#3  Rank#4  Rank#5...Rank#10 Rank#11...
Gate1   Vacant   102    101     103     107     109      111        104
Gate2   Occupied 105    113     112     115     117      121        122
Gate3   Vacant   114    116     120     125         
Gate4   Occupied 123    124      99             
Gate5   Occupied 127    129     131     130         
Gate6   Occupied 126                        
Gate7   Occupied 106    125     

        

Rank Is given based on time for each group Where Rank 1 is for earliest time and last Rank is for latest

I have tried the following one:

data_cy_grp2['Rank'] = data_cy_grp2[['Gate','Status','BUS']].sort_values(['Gate','Status','BUS'], ascending=[True, True, False]).groupby(['Gate']).cumcount()   1

data_cy_grp2[['Gate','Status','BUS','Rank']].pivot(columns='Rank', values='BUS').apply(lambda x: pd.Series(x.dropna().values))

I am not getting the optimum output.

Please team support.

CodePudding user response:

Convert the Time column to datetime then rank the values per Gate and Status then pivot the dataframe using ranks as column names

df['time'] = pd.to_datetime(df.Time)
df['rank'] = df.groupby(['Gate', 'Status'])['time'].rank(method='first').astype('int')

df.pivot(['Gate', 'Status'], 'rank', 'BUS').add_prefix('Rank#')

Result

rank            Rank#1  Rank#2  Rank#3  Rank#4  Rank#5  Rank#6
Gate  Status                                                  
Gate1 Vacant     102.0   101.0   103.0   107.0   111.0   104.0
Gate2 Occupied   105.0   113.0   112.0   115.0   121.0   122.0
Gate3 Vacant     114.0   116.0   120.0   125.0     NaN     NaN
Gate4 Occupied   123.0   124.0    99.0     NaN     NaN     NaN
Gate5 Occupied   127.0   129.0   131.0   130.0     NaN     NaN
Gate6 Occupied   126.0     NaN     NaN     NaN     NaN     NaN
Gate7 Occupied   106.0   125.0     NaN     NaN     NaN     NaN
  • Related