For example my df contains below data :
Part,Res_val,res_date,id,mass,start,sec_id ( COLUMNS ) AND BELOW ARE VALUES.
Part,res_value,res_date,id,sample_number,start,sec_id
ABC1,4,01/10/2022 15:15:15,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC2,2,01/07/2022 11:27:43,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC3,3,01/06/2022 08:12:39,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC1,4,01/06/2022 08:12:39,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC2,5,01/10/2022 15:15:14,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC3,2,01/11/2022 17:28:56,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC1,2,01/11/2022 17:28:56,GKK123,1,2022-10-03 19:35:14,AHJ234
ABC1,4,01/10/2022 15:15:15,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC2,10,01/07/2022 11:27:43,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC3,3,01/06/2022 08:12:39,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC1,4,01/06/2022 08:12:39,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC2,5,01/10/2022 15:15:14,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC3,2,01/11/2022 17:28:56,GKK122,1,2022-10-03 19:35:14,AHJ233
ABC1,2,01/11/2022 17:28:56,GKK122,1,2022-10-03 19:35:14,AHJ233
Result :
ABC1_result,ABC2_result,res_date,id,sample_number,start,sec_id
4,2,01/10/2022 15:15:15,GKK123,1,2022-10-03 19:35:14,AHJ234
4,10,01/10/2022 15:15:15,GKK122,1,2022-10-03 19:35:14,AHJ233
I want to select recent row based on time and I want to check if the sec_id having part ABC1 or ABC2 then I will take res_value respectively for that row otherwise the column would be null
So the output should look like : ABC1_result,ABC2_result,res_date,id,sample_number,start,sec_id
4,2,01/10/2022 15:15:15,GKK123,1,2022-10-03 19:35:14,AHJ234
4,10,01/10/2022 15:15:15,GKK122,1,2022-10-03 19:35:14,AHJ233
CodePudding user response:
If need test Part
if match list and then need pivoting use:
L = ['ABC1','ABC3']
df1 = df[df['Part'].isin(L)].set_index(['Part'], append=True)['Res_val'].unstack().add_prefix('Res_val_')
df = pd.concat([df1, df.drop(['Part','Res_val'], axis=1)], axis=1)
print (df)
Res_val_ABC1 Res_val_ABC3 res_date id mass \
0 1.0 NaN 09/11/2002 16:14:12 A458 5
1 NaN NaN 05/04/2002 10:23:43 A352 6
2 NaN 6.0 05/06/2002 09:18:39 B425 7
start sec_id
0 2002-16-07 23:35:14 530HQ
1 2002-14-05 12:56:14 2345T
2 2002-14-03 12:12:56 25F2
If need top values per list by datetimes use:
df['res_date'] = pd.to_datetime(df['res_date'], dayfirst=True)
L = ['ABC1','ABC2']
df1 = df[df['Part'].isin(L)].sort_values(['sec_id','res_date']).drop_duplicates(['Part','sec_id'])
print (df1)
Part res_value res_date id sample_number \
10 ABC1 4 2022-06-01 08:12:39 GKK122 9313.92
8 ABC2 10 2022-07-01 11:27:43 GKK122 9313.92
3 ABC1 4 2022-06-01 08:12:39 GKK123 9313.92
1 ABC2 2 2022-07-01 11:27:43 GKK123 9313.92
start sec_id
10 2022-10-03 19:35:14 AHJ233
8 2022-10-03 19:35:14 AHJ233
3 2022-10-03 19:35:14 AHJ234
1 2022-10-03 19:35:14 AHJ234
Because same values per groups by sec_id
is possible pivoting like:
df2 = (df1.pivot(index=['id','start','sample_number','sec_id'],
columns='Part',
values='res_value')
.add_suffix('_res_value')
.reset_index())
print (df2)
Part id start sample_number sec_id ABC1_res_value \
0 GKK122 2022-10-03 19:35:14 9313.92 AHJ233 4
1 GKK123 2022-10-03 19:35:14 9313.92 AHJ234 4
Part ABC2_res_value
0 10
1 2
CodePudding user response:
I think you want to do this?
res = df.copy()
res['Part'] = df.loc[df.Part=='ABC1', 'Res_val'].reindex(df.index)
res['Res_val'] = df.loc[df.Part=='ABC3', 'Res_val'].reindex(df.index)