Python Pandas: Group a column together by duplicates and join strings within a corresponding column-CodePudding

I'm looking to group or (groupby) the PO Header Ids and then concatenate the strings (XML) for all the rows where the PO Header IDs are the same. I've stumbled upon a few code samples but have been getting some errors.

Ultimately, the column Final_XML is what I am looking to achieve.

PO Header ID   XML   Combined_XML
123           <test1> 
123           <test2> 
456           <test3> 
567           <test4> 
567           <test5> 
567           <test6> 

Desired output
PO Header ID   Combined_XML
123            <test1><test2>
456            <test3>
567            <test4><test5><test6>

Here is what I have tried so far:

    combineXML = df.groupby(['PO Header Id']).agg(['Combined_XML']).apply(list).reset_index()
    print(combineXML)
    Throws error: KeyError: 'PO Header Id' There are no spaces in the column name so I am not sure 
    why it is not working
    df = df.groupby(['PO Header Id','XML'])['Combined_XML'].apply(''.join).reset_index()

CodePudding user response：

You could try in this way df.groupby(['PO Header ID'])['XML'].apply(''.join).reset_index()

CodePudding user response：

You can use .GroupBy.agg() with named aggregation as follows:

combineXML = df.groupby('PO Header ID', as_index=False).agg(Combined_XML=('XML', ''.join))

or use the following if your column name is actually PO Header Id:

combineXML = df.groupby('PO Header Id', as_index=False).agg(Combined_XML=('XML', ''.join))

Result:

print(combineXML)


   PO Header ID           Combined_XML
0           123         <test1><test2>
1           456                <test3>
2           567  <test4><test5><test6>