I have a dataframe df
whose columns contains string representation of lists.
import pandas as pd
data = {'A': [['ABCD'], ['PQRS'], ['LMNOP']], 'B':[['YUIO', 'DFGH'], ['QWERT', 'CVDF', 'WERT'], ['BCLF', 'DASE', 'OPIU', 'RTYU']]}
df = pd.DataFrame(data)
df['A'] = df['A'].astype('str') #Intentionally added to mimic the actual problem.
df['B'] = df['B'].astype('str') #Intentionally added to mimic the actual problem.
df.head()
| A | B |
------------ ---------------------------------------
| ['ABCD'] | ['YUIO', 'DFGH'] |
| ['PQRS'] | ['QWERT', 'CVDF', 'WERT'] |
| ['LMNOP'] | ['BCLF', 'DASE', 'OPIU', 'RTYU'] |
------------ ---------------------------------------
I'm trying to merge list in column A
with list in column B
using the below code. However, this results in an output that is not expected.
df['B'] = df['A'] df['B']
| A | B |
------------ -----------------------------------------------
| ['ABCD'] | ['ABCD']['YUIO', 'DFGH'] |
| ['PQRS'] | ['PQRS']['QWERT', 'CVDF', 'WERT'] |
| ['LMNOP'] | ['LMNOP']['BCLF', 'DASE', 'OPIU', 'RTYU'] |
------------ -----------------------------------------------
However, I'm expecting the output like below - which is something I'm unable to get using the above.
| A | B |
------------ -----------------------------------------------
| ['ABCD'] | ['ABCD', 'YUIO', 'DFGH'] |
| ['PQRS'] | ['PQRS', 'QWERT', 'CVDF', 'WERT'] |
| ['LMNOP'] | ['LMNOP', 'BCLF', 'DASE', 'OPIU', 'RTYU'] |
------------ -----------------------------------------------
Is there a way wherein I can merge two columns of string representation of lists and get an output like above ?
CodePudding user response:
You don't need convert them to string
data = {'A': [['ABCD'], ['PQRS'], ['LMNOP']], 'B':[['YUIO', 'DFGH'], ['QWERT', 'CVDF', 'WERT'], ['BCLF', 'DASE', 'OPIU', 'RTYU']]}
df = pd.DataFrame(data)
df['B'] = df['A'] df['B']
print(df)
A B
0 [ABCD] [ABCD, YUIO, DFGH]
1 [PQRS] [PQRS, QWERT, CVDF, WERT]
2 [LMNOP] [LMNOP, BCLF, DASE, OPIU, RTYU]
If it's originally string, you can use .apply(pd.eval)
or .apply(ast.literal_eval)
to convert it to Python list.
df['A'] = df['A'].astype('str')
df['B'] = df['B'].astype('str')
df['A'] = df['A'].apply(pd.eval)
df['B'] = df['B'].apply(pd.eval)
# or
import ast
df['A'] = df['A'].apply(ast.literal_eval)
df['B'] = df['B'].apply(ast.literal_eval)
df['B'] = df['A'] df['B']
CodePudding user response:
You can use string manipulation: remove the last character of A
, remove the first character of B
then join them:
df['B'] = df['A'].str[:-1] ', ' df['B'].str[1:]
print(df)
# Output
A B
0 ['ABCD'] ['ABCD', 'YUIO', 'DFGH']
1 ['PQRS'] ['PQRS', 'QWERT', 'CVDF', 'WERT']
2 ['LMNOP'] ['LMNOP', 'BCLF', 'DASE', 'OPIU', 'RTYU']