Home > OS >  Python Pandas: Group a column together by duplicates and join strings within a corresponding column
Python Pandas: Group a column together by duplicates and join strings within a corresponding column

Time:10-13

I'm looking to group or (groupby) the PO Header Ids and then concatenate the strings (XML) for all the rows where the PO Header IDs are the same. I've stumbled upon a few code samples but have been getting some errors.

Ultimately, the column Final_XML is what I am looking to achieve.

PO Header ID   XML   Combined_XML
123           <test1> 
123           <test2> 
456           <test3> 
567           <test4> 
567           <test5> 
567           <test6> 

Desired output
PO Header ID   Combined_XML
123            <test1><test2>
456            <test3>
567            <test4><test5><test6>

Here is what I have tried so far:

    combineXML = df.groupby(['PO Header Id']).agg(['Combined_XML']).apply(list).reset_index()
    print(combineXML)
    Throws error: KeyError: 'PO Header Id' There are no spaces in the column name so I am not sure 
    why it is not working
    df = df.groupby(['PO Header Id','XML'])['Combined_XML'].apply(''.join).reset_index()

CodePudding user response:

You could try in this way df.groupby(['PO Header ID'])['XML'].apply(''.join).reset_index()

CodePudding user response:

You can use .GroupBy.agg() with named aggregation as follows:

combineXML = df.groupby('PO Header ID', as_index=False).agg(Combined_XML=('XML', ''.join))

or use the following if your column name is actually PO Header Id:

combineXML = df.groupby('PO Header Id', as_index=False).agg(Combined_XML=('XML', ''.join))

Result:

print(combineXML)


   PO Header ID           Combined_XML
0           123         <test1><test2>
1           456                <test3>
2           567  <test4><test5><test6>
  • Related