Home > Net >  Pandas Concat and remove all duplicate row values
Pandas Concat and remove all duplicate row values

Time:10-04

I have a DataFrame in Pandas that looks like this

data 1  data 2  data 3 
swag     swag    swag
yo       swag    hey
hey      yo      yo 

I want to concatenate these columns into on and remove any duplicate data in the rows being removed.

It'd print out like so (Since there are three swags in the first row and they are duplicates, it have only one swag, then the next row it has yo and swag)

data (column name)
swag
yo
swag
hey
hey
yo

CodePudding user response:

Do you care about the order of your values? If yes:

df.apply(lambda x: dict.fromkeys(x), axis=1).explode()
0    swag
1      yo
1    swag
1     hey
2     hey
2      yo
dtype: object

If not:

list(map(set, df.values)) 
[{'swag'}, {'swag', 'hey', 'yo'}, {'hey', 'yo'}]

is faster.

CodePudding user response:

You can use pandas.DataFrame.stack :

out = (
        df.stack()
          .reset_index()
          .drop_duplicates(subset=['level_0', 0], keep='first')
          .rename(columns= {0: 'data'})
          .drop(columns=['level_0', 'level_1'])
          .reset_index(drop=True)
       )

# Output :

print(out)

   data
0  swag
1    yo
2  swag
3   hey
4   hey
5    yo
  • Related