I'm trying to create a new column in the below sample data frame that combines the data from the customer_response and the bot_response into a list shown in the combined column of the second table shown below.
One thing to note is that there will always be the same number of delimiters for the customer and bot column for each row.
df
conv_id | customer_response | bot_response |
---|---|---|
1 | Make a payment; Credit Card; $1000; Thanks! | I can help with that. Which account?; What amount?; Payment successful; You're Welcome! |
2 | Replace a card; Ending in 4352; overnight; thanks | I can help replace your card, which account?; How soon do you need it; The replacement card will arrive in 3-5 business days; No problem |
3 | What is my due date?; Am I past due?; thanks | Hello, your due date is the 3rd of each month; No, your account is current; you're welcome! |
Desired output:
conv_id | customer_response | bot_response | combined |
---|---|---|---|
1 | Make a payment; Credit Card; $1000; Thanks! | I can help with that. Which account?; What amount?; Payment successful; You're Welcome! | ["Customer: Make a payment", "Bot: I can help with that. Which account?", "Customer: Credit Card", "Bot: What amount?", "Customer: $1000", "Bot: Payment successful", "Customer: Thanks!", "Bot: You're Welcome!"] |
2 | ... | ... | ... |
3 | ... | ... | ... |
Here is my code so far but for some reason I'm drawing a blank on how to create a column to combine the two as shown in the combined column in the second table above.
df['cust_dil'] = [x for x in df['customer_response'].str.split(';')]
df['bot_dil'] = [x for x in df['bot_response'].str.split(';')]
At a high level, I know I need to loop through the elements in each row after splitting them and add the the respective text "Bot" or "Customer" before each element in the list and then combine the 2 columns in order.
I'm really stumped on this so any help that can be given is greatly appreciated.
CodePudding user response:
When you want to create a column whose value depends on more than one other column, DataFrame.apply
is a good way to do it. You might try something like this
def combine_responses(row):
cust_dil = [x for x in row.customer_response.str.split(';')]
bot_dil = [x for x in row.bot_response.str.split(';')]
## Write some code to interleave these.
## https://stackoverflow.com/questions/7946798/interleave-multiple-lists-of-the-same-length-in-python
return [val for pair in zip(cust_dil, bot_dil) for val in pair]
df['combined'] = df.apply(combine_responses, axis=1)
CodePudding user response:
Here is a more pandaic
approach to your problem with no loops
, comprehension list
and apply
logic.
import pandas as pd
import numpy as np
data_dict= {
'conv_id': [1, 2, 3],
'customer_response': ['Make a payment; Credit Card; $1000; Thanks!', 'Replace a card; Ending in 4352; overnight; thanks', 'What is my due date?; Am I past due?; thanks'],
'bot_response': ['I can help with that. Which account?; What amount?; Payment successful; You\'re Welcome!', 'I can help replace your card, which account?; How soon do you need it; The replacement card will arrive in 3-5 business days; No problem', 'Hello, your due date is the 3rd of each month; No, your account is current; you\'re welcome!']}
df = pd.DataFrame(data_dict)
bot_response_split = df["bot_response"].str.split(';', expand=True).fillna("").applymap('Bot : {}'.format).replace('Bot : $', '',regex=True)
customer_response_split = df["customer_response"].str.split(';', expand=True).fillna("").applymap('Customer : {}'.format).replace('Customer : $', '',regex=True)
bot_response_split.columns = np.arange(len(customer_response_split.columns), len(customer_response_split.columns) len(bot_response_split.columns))
new_columns = list(customer_response_split.columns) list(bot_response_split.columns)
new_columns[::2] = list(customer_response_split.columns)
new_columns[1::2] = list(bot_response_split.columns)
df['combined'] = pd.concat([customer_response_split, bot_response_split], axis=1).reindex(columns=new_columns).values.tolist()
Code explanation :
- We first split the column on the delimiter
;
, those many columns will be created as the number of chats for each cell in the column, then we fill all ourNone
values withempty string
, then based on regex we prepend the appropriate third person in the chat egBot and customer
here. After that we agin apply regex to replace those cells byempty string
for which applymap added 'Customer : ' but the cell was empty.
bot_response_split = df["bot_response"].str.split(';', expand=True).fillna("").applymap('Bot : {}'.format).replace('Bot : $', '',regex=True)
customer_response_split = df["customer_response"].str.split(';', expand=True).fillna("").applymap('Customer : {}'.format).replace('Customer : $', '',regex=True)
- Secondly we rename the column names for the
bot_response_datframe
as this will help in interleaving.
bot_response_split.columns = np.arange(len(customer_response_split.columns), len(customer_response_split.columns) len(bot_response_split.columns))
- Next we join the column list from
customer
andbot
dataframe to a third column list which will help us in interleavingnew_columns
new_columns = list(customer_response_split.columns) list(bot_response_split.columns)
- From the
customer
dataframe we take the column list and insert it tonew_columns
alternatively. We are doing the same forbot
dataframe column as well.
new_columns[::2] = list(customer_response_split.columns)
new_columns[1::2] = list(bot_response_split.columns)
- In the end, we concat our
customer
andbot
dataframe and re order our indexes such that the columns are interleaved, then we join all the columns so that the values are interleaved and insert it to a third column indf
calledcombined
df['combined'] = pd.concat([customer_response_split, bot_response_split], axis=1).reindex(columns=new_columns).values.tolist()
Output :