Home > Enterprise >  Groupby and Apply Functions on multiple Columns with 1-to-many relationship
Groupby and Apply Functions on multiple Columns with 1-to-many relationship

Time:11-10

I am trying to consolidate a one-to-many relationship in my dataset while creating a link for each, and sometimes multiple items. For a given Ticket, there can be 1 or many Work Orders, so I have used the following to consolidate and list Work Orders in a single column:

df = df.groupby(['Date', 'Ticket ID', 'Score', 'many other Columns...'])['Work Order'].apply(', '.join).reset_index()

Which gives me a nice output where Work Orders are either listed individually or comma separted like so:

Date                        Ticket ID           Work Order
2018-08-30 22:52:25         1444008             119846184
2021-09-29 13:33:49         1724734             122445397, 122441551

Now, I want to create links for the Work Orders as well that are also comma delimited. For the single Work Orders the following works just fine:

woLink = r'www.google.com/woNum='
df['WO Link'] = woLink   df['Work Order'].astype(str)

However for the multiple it essentially undoes the groupby and re-adds the row that I grouped by. From the sample data above like:

# Wrong
Date                        Ticket ID           Work Order           Link
2018-08-30 22:52:25         1444008             119846184            google.com/woNum=119846184
2021-09-29 13:33:49         1724734             122445397            google.com/woNum=122445397
2021-09-29 13:33:49         1724734             122441551            google.com/woNum=122441551

What I want is:

Date                        Ticket ID           Work Order           Link (s)
2018-08-30 22:52:25         1444008             119846184            google.com/woNum=119846184
2021-09-29 13:33:49         1724734             122445397, 122441551 google.com/woNum=122445397, google.com/woNum=122441551

I have tried several different variations of Groupby/apply like regrouping without success:

df = df.groupby(['Date', 'Ticket ID', 'Score', 'many other Columns...'])['Link'].apply(', '.join).reset_index()

It always seems to duplicate the row again - I know this is because WO and WOLink both have a 1 to Many, but I cannot seem to figure how to handle both at once.

How can I groupby and consolidate this dataframe?

CodePudding user response:

You can use a regex to add the URL part:

woLink = r'example.org/woNum='
df['Link'] = df['Work Order'].str.replace('(\d )', rf'{woLink}\1')

output:

                  Date  Ticket ID            Work Order                                                      Link
0  2018-08-30 22:52:25    1444008             119846184                               example.org/woNum=119846184
1  2021-09-29 13:33:49    1724734  122445397, 122441551  example.org/woNum=122445397, example.org/woNum=122441551
  • Related