I have something like this (except many more rows):
col1 col2 col3
0 xyz1 3.9 ['A', 'B']
1 xyz2 8.0 ['C', 'A', 'C', 'D']
I want to make it look something like this:
col1 col2 col3
0 xyz1 3.9 'A'
1 xyz1 3.9 'B'
2 xyz2 8.0 'A'
3 xyz2 8.0 'C'
4 xyz2 8.0 'D'
EDIT: There could be duplicates in the series (like with 'C' which i want to remove). But essentially it will remove the pandas.core.series.Series (not a list) in col3 and flatten it to strings in multiple rows. Is there an easy way to do this?
CodePudding user response:
Use explode
df = df.explode('col3').drop_duplicates().reset_index(drop=1)
CodePudding user response:
I just wrote a loop to convert the pandas series to strings separated by comma. Then you can use explode
.
tmp_list = []
for i in result_df['col3'].values:
str1 = i.replace(']','').replace('[','').replace("'", '')
op = str1.replace('"','').split(",")
tmp_list.append(op)
result_df['col3'] = tmp_list
result_df = result_df.explode('col3').drop_duplicates().reset_index(drop=1)
This worked for me, hope it works for anyone else who needs it. But there must be faster ways to do this (without needing a loop).