I have a DataFrame in Pandas that looks like this
data 1 data 2 data 3
swag swag swag
yo swag hey
hey yo yo
I want to concatenate these columns into on and remove any duplicate data in the rows being removed.
It'd print out like so (Since there are three swags in the first row and they are duplicates, it have only one swag, then the next row it has yo and swag)
data (column name)
swag
yo
swag
hey
hey
yo
CodePudding user response:
Do you care about the order of your values? If yes:
df.apply(lambda x: dict.fromkeys(x), axis=1).explode()
0 swag
1 yo
1 swag
1 hey
2 hey
2 yo
dtype: object
If not:
list(map(set, df.values))
[{'swag'}, {'swag', 'hey', 'yo'}, {'hey', 'yo'}]
is faster.
CodePudding user response:
You can use pandas.DataFrame.stack
:
out = (
df.stack()
.reset_index()
.drop_duplicates(subset=['level_0', 0], keep='first')
.rename(columns= {0: 'data'})
.drop(columns=['level_0', 'level_1'])
.reset_index(drop=True)
)
# Output :
print(out)
data
0 swag
1 yo
2 swag
3 hey
4 hey
5 yo