Home > Blockchain >  Python: Expand Pandas Series in Dataframe and remove duplicates
Python: Expand Pandas Series in Dataframe and remove duplicates

Time:09-23

I have something like this (except many more rows):

     col1   col2                    col3
0    xyz1    3.9              ['A', 'B']
1    xyz2    8.0    ['C', 'A', 'C', 'D']

I want to make it look something like this:

     col1   col2   col3
0    xyz1    3.9    'A'
1    xyz1    3.9    'B'
2    xyz2    8.0    'A'
3    xyz2    8.0    'C'
4    xyz2    8.0    'D'

EDIT: There could be duplicates in the series (like with 'C' which i want to remove). But essentially it will remove the pandas.core.series.Series (not a list) in col3 and flatten it to strings in multiple rows. Is there an easy way to do this?

CodePudding user response:

Use explode

df = df.explode('col3').drop_duplicates().reset_index(drop=1)

CodePudding user response:

I just wrote a loop to convert the pandas series to strings separated by comma. Then you can use explode.

tmp_list = []
for i in result_df['col3'].values:
    str1 = i.replace(']','').replace('[','').replace("'", '')
    op = str1.replace('"','').split(",")
    tmp_list.append(op)

result_df['col3'] = tmp_list
result_df = result_df.explode('col3').drop_duplicates().reset_index(drop=1)

This worked for me, hope it works for anyone else who needs it. But there must be faster ways to do this (without needing a loop).

  • Related