Home > Enterprise >  Combining two columns in a dataframe in a certain order
Combining two columns in a dataframe in a certain order

Time:07-02

I'm trying to create a new column in the below sample data frame that combines the data from the customer_response and the bot_response into a list shown in the combined column of the second table shown below.

One thing to note is that there will always be the same number of delimiters for the customer and bot column for each row.

df

conv_id customer_response bot_response
1 Make a payment; Credit Card; $1000; Thanks! I can help with that. Which account?; What amount?; Payment successful; You're Welcome!
2 Replace a card; Ending in 4352; overnight; thanks I can help replace your card, which account?; How soon do you need it; The replacement card will arrive in 3-5 business days; No problem
3 What is my due date?; Am I past due?; thanks Hello, your due date is the 3rd of each month; No, your account is current; you're welcome!

Desired output:

conv_id customer_response bot_response combined
1 Make a payment; Credit Card; $1000; Thanks! I can help with that. Which account?; What amount?; Payment successful; You're Welcome! ["Customer: Make a payment", "Bot: I can help with that. Which account?", "Customer: Credit Card", "Bot: What amount?", "Customer: $1000", "Bot: Payment successful", "Customer: Thanks!", "Bot: You're Welcome!"]
2 ... ... ...
3 ... ... ...

Here is my code so far but for some reason I'm drawing a blank on how to create a column to combine the two as shown in the combined column in the second table above.

df['cust_dil'] = [x for x in df['customer_response'].str.split(';')]
df['bot_dil'] = [x for x in df['bot_response'].str.split(';')]

At a high level, I know I need to loop through the elements in each row after splitting them and add the the respective text "Bot" or "Customer" before each element in the list and then combine the 2 columns in order.

I'm really stumped on this so any help that can be given is greatly appreciated.

CodePudding user response:

When you want to create a column whose value depends on more than one other column, DataFrame.apply is a good way to do it. You might try something like this

def combine_responses(row):
    cust_dil = [x for x in row.customer_response.str.split(';')]
    bot_dil = [x for x in row.bot_response.str.split(';')]
    ## Write some code to interleave these.
    ## https://stackoverflow.com/questions/7946798/interleave-multiple-lists-of-the-same-length-in-python
    return [val for pair in zip(cust_dil, bot_dil) for val in pair]

df['combined'] = df.apply(combine_responses, axis=1)

CodePudding user response:

Here is a more pandaic approach to your problem with no loops, comprehension list and apply logic.

import pandas as pd
import numpy as np

data_dict= {
    'conv_id': [1, 2, 3],
    'customer_response': ['Make a payment; Credit Card; $1000; Thanks!', 'Replace a card; Ending in 4352; overnight; thanks', 'What is my due date?; Am I past due?; thanks'],
    'bot_response': ['I can help with that. Which account?; What amount?; Payment successful; You\'re Welcome!', 'I can help replace your card, which account?; How soon do you need it; The replacement card will arrive in 3-5 business days; No problem', 'Hello, your due date is the 3rd of each month; No, your account is current; you\'re welcome!']}

df = pd.DataFrame(data_dict)
bot_response_split = df["bot_response"].str.split(';', expand=True).fillna("").applymap('Bot : {}'.format).replace('Bot : $', '',regex=True)
customer_response_split = df["customer_response"].str.split(';', expand=True).fillna("").applymap('Customer : {}'.format).replace('Customer : $', '',regex=True)

bot_response_split.columns = np.arange(len(customer_response_split.columns), len(customer_response_split.columns)   len(bot_response_split.columns))

new_columns = list(customer_response_split.columns)   list(bot_response_split.columns)
new_columns[::2] = list(customer_response_split.columns)
new_columns[1::2] = list(bot_response_split.columns)

df['combined'] = pd.concat([customer_response_split, bot_response_split], axis=1).reindex(columns=new_columns).values.tolist()

Code explanation :

  • We first split the column on the delimiter ;, those many columns will be created as the number of chats for each cell in the column, then we fill all our None values with empty string, then based on regex we prepend the appropriate third person in the chat eg Bot and customer here. After that we agin apply regex to replace those cells by empty string for which applymap added 'Customer : ' but the cell was empty.
bot_response_split = df["bot_response"].str.split(';', expand=True).fillna("").applymap('Bot : {}'.format).replace('Bot : $', '',regex=True)
customer_response_split = df["customer_response"].str.split(';', expand=True).fillna("").applymap('Customer : {}'.format).replace('Customer : $', '',regex=True)
  • Secondly we rename the column names for the bot_response_datframe as this will help in interleaving.
bot_response_split.columns = np.arange(len(customer_response_split.columns), len(customer_response_split.columns)   len(bot_response_split.columns))
  • Next we join the column list from customer and bot dataframe to a third column list which will help us in interleaving new_columns
new_columns = list(customer_response_split.columns)   list(bot_response_split.columns)
  • From the customer dataframe we take the column list and insert it to new_columns alternatively. We are doing the same for bot dataframe column as well.
new_columns[::2] = list(customer_response_split.columns)
new_columns[1::2] = list(bot_response_split.columns)
  • In the end, we concat our customer and bot dataframe and re order our indexes such that the columns are interleaved, then we join all the columns so that the values are interleaved and insert it to a third column in df called combined
df['combined'] = pd.concat([customer_response_split, bot_response_split], axis=1).reindex(columns=new_columns).values.tolist()

Output :

enter image description here

  • Related