Sorry about the title but didn't know the best way to put this :).
So on the first file, I have Old and New data. And I also have another file that has only the old data (it's the IPs configured on the network).
but I can have for the same NUMBER on the old data, multiple equipment's that change the NE_NAME, so I need to find by the NEXT_HOP_IP.
the problem is I only have the NEXT_HOP_IP match for the NE_NAME only for the old data but I need both to have the same NE_NAME.
I am working with pandas and I am trying to understand if using join or merge I can achieve something like this or if I need to use another solution?
Example:
The file with new/old data
NUMBER route_id route_type NEXT_HOP_IP
14738 14738_site_1 new 10.43.148.221
14738 14738_site_1 old 10.43.148.217
79470 79470_site_6 new 10.43.148.101
79470 79470_site_6 old 10.43.148.241
The file containing old data (duplicate NE_NAME but unique NEXT_HOP_IP)
NE_NAME NEXT_HOP_IP
ABC14738_A 10.43.148.217
ABC14738_A ...
ABC79470_B 10.43.148.241
ABC79470_B ...
ABC14738_B xx.xx.xx.xx
ABC79470_C xx.xx.xx.xx
Final result
NUMBER route_id route_type NEXTHOP Name
14738 14738_site_1 new 10.43.148.221 ABC14738_A
14738 14738_site_1 old 10.43.148.217 ABC14738_A
79470 79470_site_6 new 10.43.148.101 ABC79470_B
79470 79470_site_6 old 10.43.148.241 ABC79470_B
Thanks in advance for any help
CodePudding user response:
Left merge the dataframes then group the merged frame by NUMBER
and transform
NE_NAME
with first
to select the first non null value per group
df1['NAME'] = df1.merge(df2, how='left').groupby('NUMBER')['NE_NAME'].transform('first')
NUMBER route_id route_type NEXT_HOP_IP NAME
0 14738 14738_site_1 new 10.43.148.221 ABC14738_A
1 14738 14738_site_1 old 10.43.148.217 ABC14738_A
2 79470 79470_site_6 new 10.43.148.101 ABC79470_B
3 79470 79470_site_6 old 10.43.148.241 ABC79470_B