Input Dataframe With TerminalID,TName,XY cordinate, PeopleID
import pandas as pd
data = {
'TerminalID': ['5','21','21','2','21','2','5','22','22','22','2','32','41','41','42','50','50'],
'TName': ['AD','AMBO','AMBO','PS','AMBO','PS','AD','AM','AM','AM','PS','BO','BA','BA','BB','AZ','AZ'],
'xy': ['1.12731,1.153756','0.12731,0.153757','0.12731,0.153757','1.989385,1.201941','0.12731,0.153757','1.989385,1.201941','1.12731,1.153756','2.12731,1.153756','2.12731,1.153756','2.12731,1.153756','1.989385,1.201941','1.989385,1.201941','2.989385,1.201941','2.989385,1.201941','2.989385,3.201941','3.989385,3.201941','3.989385,3.201941'],
'Pcode': [ 'None','Z014','Z015','Z016','Z017','Z018','None','Z020','Z021','Z022','Z023','Z024','Z025','Z026','Z027','Z028','Z029']
}
df = pd.DataFrame.from_dict(data)
Out[55]:
output of DF1
TerminalID TName xy Pcode
0 5 AD 1.12731,1.153756 None
1 21 AMBO 0.12731,0.153757 Z014
2 21 AMBO 0.12731,0.153757 Z015
3 2 PS 1.989385,1.201941 Z016
4 21 AMBO 0.12731,0.153757 Z017
5 2 PS 1.989385,1.201941 Z018
6 5 AD 1.12731,1.153756 None
7 22 AM 2.12731,1.153756 Z020
8 22 AM 2.12731,1.153756 Z021
9 22 AM 2.12731,1.153756 Z022
10 2 PS 1.989385,1.201941 Z023
11 32 BO 1.989385,1.201941 Z024
12 41 BA 2.989385,1.201941 Z025
13 41 BA 2.989385,1.201941 Z026
14 42 BB 2.989385,3.201941 Z027
15 50 AZ 3.989385,3.201941 Z028
16 50 AZ 3.989385,3.201941 Z029
DF2,
T_cap is the capacity requirement at the Terminal ID and the T_load is the Load details, Tcap is the running count increment and T_load is the actual request at the Termainal, The 0 at the start and the end are padding for the solution
data2= {
'BusID': ['18','18','18','18','18','18','18','18','18'],
'Tcap': ['0','2','3','6','7','8','10','12','12'],
'T_Load': ['0','2','1','2','2','1','2','2','0'],
'TerminalID': [ '5','21','33','2','32','42','41','50','5'],
'TName':['AD','AMBO','AM','PS','BO','BB','BA','AZ','AD']
}
df2 = pd.DataFrame.from_dict(data2)
Out[59]:
BusID Tcap T_Load TerminalID TName
0 18 0 0 5 AD
1 18 2 2 21 AMBO
2 18 3 1 33 AM
3 18 6 2 2 PS
4 18 7 2 32 BO
5 18 8 1 42 BB
6 18 10 2 41 BA
7 18 12 2 50 AZ
8 18 12 0 5 AD
Data Frame # the Final output requested
The output is based on the T_Load contstrains.
data3 = {
'BusID': ['18','18','18','18','18','18','18','18','18'],
'Tcap': ['0','2','3','6','7','8','10','12','12'],
'T_Load': ['0','2','1','3','1','1','2','2','0'],
'TerminalID': [ '5','21','33','2','32','42','41','50','5'],
'TName':['AD','AMBO','AM','PS','BO','BB','BA','AZ','AD'],
'Pcode':['None','Z013,Z019','Z020','Z016,Z018,Z023','Z024','Z027','Z025,Z026','Z028,Z029','None']
}
df3 = pd.DataFrame.from_dict(data3)
Out[61]:
BusID Tcap T_Load TerminalID TName Pcode
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z013,Z019
2 18 3 1 33 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
Thanking you
CodePudding user response:
My solution aggreagte join per TerminalID
and TName
and assign to another DataFrame by aggreagte list, last filter values by positions in list comprehension with join
:
s = df.groupby(['TerminalID','TName'])['Pcode'].agg(list).rename('P_list')
df = df2.join(s, on=['TerminalID','TName'])
df['P_list'] = [','.join(x[:int(y)]) if int(y) != 0 else None
for x, y in zip(df['P_list'], df['T_Load'])]
print (df)
BusID Tcap T_Load TerminalID TName P_list
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015
2 18 3 1 22 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
CodePudding user response:
You can map
the aggregated strings per TName:
df2['Plist'] = df2['TName'].map(df.groupby('TName')['Pcode'].agg(','.join))
or, if you want to replace multiple string None as single one:
df2['Plist'] = df2['TName'].map(df.groupby('TName')['Pcode']
.agg(lambda x: ','.join(e for e in x if e != 'None'))
.replace('', 'None')
)
output:
BusID Tcap T_Load TerminalID TName Plist
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015,Z017
2 18 3 1 22 AM Z020,Z021,Z022
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None
update: limiting the output:
you can then trim the column with a regex, we can use a groupby
to benefit from vectorized string operations within each group (this is mostly interesting if there are few groups and many rows):
df2['P_list'] = (df2.groupby('T_Load')['P_list']
.apply(lambda c: c.str.extract(rf'((?:[^,] ,?){{,{str(c.name)}}})',
expand=False)
.str.strip(',')
)
.replace('', 'None')
)
output:
BusID Tcap T_Load TerminalID TName P_list
0 18 0 0 5 AD None
1 18 2 2 21 AMBO Z014,Z015
2 18 3 1 22 AM Z020
3 18 6 3 2 PS Z016,Z018,Z023
4 18 7 1 32 BO Z024
5 18 8 1 42 BB Z027
6 18 10 2 41 BA Z025,Z026
7 18 12 2 50 AZ Z028,Z029
8 18 12 0 5 AD None