I'm looking to group or (groupby) the PO Header Ids and then concatenate the strings (XML) for all the rows where the PO Header IDs are the same. I've stumbled upon a few code samples but have been getting some errors.
Ultimately, the column Final_XML is what I am looking to achieve.
PO Header ID XML Combined_XML
123 <test1>
123 <test2>
456 <test3>
567 <test4>
567 <test5>
567 <test6>
Desired output
PO Header ID Combined_XML
123 <test1><test2>
456 <test3>
567 <test4><test5><test6>
Here is what I have tried so far:
combineXML = df.groupby(['PO Header Id']).agg(['Combined_XML']).apply(list).reset_index()
print(combineXML)
Throws error: KeyError: 'PO Header Id' There are no spaces in the column name so I am not sure
why it is not working
df = df.groupby(['PO Header Id','XML'])['Combined_XML'].apply(''.join).reset_index()
CodePudding user response:
You could try in this way
df.groupby(['PO Header ID'])['XML'].apply(''.join).reset_index()
CodePudding user response:
You can use .GroupBy.agg()
with named aggregation as follows:
combineXML = df.groupby('PO Header ID', as_index=False).agg(Combined_XML=('XML', ''.join))
or use the following if your column name is actually PO Header Id
:
combineXML = df.groupby('PO Header Id', as_index=False).agg(Combined_XML=('XML', ''.join))
Result:
print(combineXML)
PO Header ID Combined_XML
0 123 <test1><test2>
1 456 <test3>
2 567 <test4><test5><test6>