Home > Net >  Reversal of HotEncoding without reducing columns to one
Reversal of HotEncoding without reducing columns to one

Time:12-11

I have a dataset which shows each transaction as a row.

for example;

Item_1 Item_2 Item_3
NaN 1 1
1 1 NaN

The table has 611 columns with 1180 rows, therefore 611 items and 1180 transactions.

I'm looking to do a basket analysis hence I need all rows which have '1' to be changed to the Item 'name'

For example...

Item_1 Item_2 Item_3
NaN Item_2 Item_3
Item_1 Item_2 NaN

Then I aim to delete the header columns and just have each transaction on each row aligned without NaN's

i.e

No_header No_header No_header
Item_2 Item_3 NaN
Item_1 Item_2 NaN

CodePudding user response:

Try this:

items = df.apply(lambda col: col.map({1: col.name})).apply(lambda row: row[~row.isna()].tolist(), axis=1)

Output:

>>> items
0    [Item_2, Item_3]
1    [Item_1, Item_2]
dtype: object

>>> type(items)
pandas.core.series.Series
  • Related