Merge two rows of data and join values if unique

I am re-writing a VBScript that I wrote in Excel a while ago in Python Pandas. The reason for this is due to it breaking regularly as using VBS Macros in an Apple version of Microsoft Office is unstable.

The script creates three separate CSVs from one large XLS export from our database which are for three different Mail lists. unfortunately, we have no access to edit how the export is made so I am trying to make it quick and easy for the people who need these mail lists updated regularly.

Due to it containing confidential and personal information I won't use real data but will make some dummy data instead.

I am up to a part where there is one contact person for two clients and I need to merge and join this data. Getting it to the file is the easy part but merging and joining is the hard part. Here is an example of what I'm trying to do:

In words, I want pandas to detect a duplicate contact merge the identical data and add a Join with an & separation in the unique data being joined.

RAW:

Last Name	First Name	Preferred Name	Type	ID	Contact First Name	Contact Last Name	Email
TMart	Bob	Bob	TypeA	22	Sully	TMart	[email protected]
TMart	Jannet	Jan	TypeB	23	Sully	TMart	[email protected]

Required Output:

Last Name	First Name	Preferred Name	Type	ID	Contact First Name	Contact Last Name	Email
TMart	Bob & Jannet	Bob & Jan	TypeA & TypeB	22 & 23	Sully	TMart	[email protected]

Below is the line of Code that would work for one Collumn but if I added multiple to it, it would break and just merge the column names:

df2 = df1.groupby(['Last Name','ID','Contact First Name','Contact Last Name','Email'])['First Name'].apply(' & '.join).reset_index()

Which would Output exactly what I want: Bob & Jannet But, if I added more to the .apply object.

df2 = df1.groupby(['Last Name','ID','Contact First Name','Contact Last Name','Email'])['First Name','ID'].apply(' & '.join).reset_index()

It would output First Name & ID instead of the data in those columns.

I am not fully sure this method will be the right one as two people could have the same Type and I don't want it to join those together.

Sorry if I am not clear enough, did my best to explain what I was aiming for. Doesn't have to be written this way, happy to be shown a new and more efficient way but instead of spending days figuring it out I thought I would ask the experts.

Thanks!

CodePudding user response：

Use GroupBy.agg and add nested lists for processing columns by aggregate function []:

df2 = df1.groupby(['Last Name','ID','Contact First Name','Contact Last Name','Email'])[['First Name','ID']].agg(' & '.join).reset_index()