Home > Software engineering >  Merge two columns of pandas dataframe containing string representation of lists
Merge two columns of pandas dataframe containing string representation of lists

Time:07-21

I have a dataframe df whose columns contains string representation of lists.

import pandas as pd

data = {'A': [['ABCD'], ['PQRS'], ['LMNOP']], 'B':[['YUIO', 'DFGH'], ['QWERT', 'CVDF', 'WERT'], ['BCLF', 'DASE', 'OPIU', 'RTYU']]}

df = pd.DataFrame(data)

df['A'] = df['A'].astype('str') #Intentionally added to mimic the actual problem.
df['B'] = df['B'].astype('str') #Intentionally added to mimic the actual problem.

df.head()


     |     A      |    B                                  | 
      ------------ --------------------------------------- 
     |  ['ABCD']  | ['YUIO', 'DFGH']                      | 
     |  ['PQRS']  | ['QWERT', 'CVDF', 'WERT']             |  
     | ['LMNOP']  | ['BCLF', 'DASE', 'OPIU', 'RTYU']      |  
      ------------ --------------------------------------- 

I'm trying to merge list in column A with list in column B using the below code. However, this results in an output that is not expected.


 df['B'] = df['A']   df['B']

     |     A      |    B                                          | 
      ------------ ----------------------------------------------- 
     |  ['ABCD']  | ['ABCD']['YUIO', 'DFGH']                      | 
     |  ['PQRS']  | ['PQRS']['QWERT', 'CVDF', 'WERT']             |  
     | ['LMNOP']  | ['LMNOP']['BCLF', 'DASE', 'OPIU', 'RTYU']     |  
      ------------ ----------------------------------------------- 

However, I'm expecting the output like below - which is something I'm unable to get using the above.


     |     A      |    B                                          | 
      ------------ ----------------------------------------------- 
     |  ['ABCD']  | ['ABCD', 'YUIO', 'DFGH']                      | 
     |  ['PQRS']  | ['PQRS', 'QWERT', 'CVDF', 'WERT']             |  
     | ['LMNOP']  | ['LMNOP', 'BCLF', 'DASE', 'OPIU', 'RTYU']     |  
      ------------ ----------------------------------------------- 

Is there a way wherein I can merge two columns of string representation of lists and get an output like above ?

CodePudding user response:

You don't need convert them to string

data = {'A': [['ABCD'], ['PQRS'], ['LMNOP']], 'B':[['YUIO', 'DFGH'], ['QWERT', 'CVDF', 'WERT'], ['BCLF', 'DASE', 'OPIU', 'RTYU']]}

df = pd.DataFrame(data)

df['B'] = df['A']   df['B']
print(df)

         A                                B
0   [ABCD]               [ABCD, YUIO, DFGH]
1   [PQRS]        [PQRS, QWERT, CVDF, WERT]
2  [LMNOP]  [LMNOP, BCLF, DASE, OPIU, RTYU]

If it's originally string, you can use .apply(pd.eval) or .apply(ast.literal_eval) to convert it to Python list.

df['A'] = df['A'].astype('str')
df['B'] = df['B'].astype('str')

df['A'] = df['A'].apply(pd.eval)
df['B'] = df['B'].apply(pd.eval)
# or
import ast
df['A'] = df['A'].apply(ast.literal_eval)
df['B'] = df['B'].apply(ast.literal_eval)

df['B'] = df['A']   df['B']

CodePudding user response:

You can use string manipulation: remove the last character of A, remove the first character of B then join them:

df['B'] = df['A'].str[:-1]   ', '   df['B'].str[1:]
print(df)

# Output
           A                                          B
0   ['ABCD']                   ['ABCD', 'YUIO', 'DFGH']
1   ['PQRS']          ['PQRS', 'QWERT', 'CVDF', 'WERT']
2  ['LMNOP']  ['LMNOP', 'BCLF', 'DASE', 'OPIU', 'RTYU']
  • Related