Hi I am trying to do one hot encoding in Orange in order to conduct market basket analysis.
Currently I have transaction data as follows in my CSV:
C# | Items | ||
---|---|---|---|
C1 | Apple | Orange | |
C2 | Baby Milk | Apple | Orange |
I would like to find out what are the steps that I can do to process the data in orange or other software such that I am able to get this state for my data
C# | Apple | Orange | Baby Milk |
---|---|---|---|
C1 | 1 | 1 | 0 |
C2 | 1 | 1 | 1 |
Currently when I try to preprocess the data in orange using "continous discrete variables - one feature per line" I get individual feature value columns.
CodePudding user response:
It is not entirely straightforward, but you could concatenate your products with comma or semicolon, pass it to Corpus, apply tokenization based on your concatenation character (comma, semicolon) with a Regex, then use Bag of Words from the Text add-on. I have tried it with Associate add-on, and it seems to work.