I have a question regarding iteration and combination in Pandas
I have the following dataframe like this
df1:
flavor_1 flavor_2 flavor_3 flavor_4 flavor_5 flavor_6 flavor_7 flavor_8 flavor_9 price id
Lime Lemon Grape Grass Nan Nan Nan Nan Nan 80 1
Lime Peach Nan Nan Nan Nan Nan Nan Nan 89 2
Lime Plum Grape Grass Vanilla Plum Fig Olive Cherry 81 3
Lime Black Grape Grass Plum Fig Nan Nan Nan 84 4
.
.
.
.
Lime Lemon Grape Grass Nan Nan Nan Nan Nan 80 300
I want to generate a new data frame with all possible combinations of two columns in the SAME ROW. For example,
df_new:
Target Source Price id
Lemon Grape 80 1
Lemon Grass 80 1
Grape Grass 80 1
Plum Grape 81 3
Plum Grass 81 3
.
.
.
.
Lemon Grape 80 300
Lemon Grass 80 300
Grap Grass 80 300
Thus, I tried this code t
import itertools
def comb(df1):
return [df1.loc[:, list(x)].set_axis(['Target','Source'], axis=1)
for x in itertools.combinations(df1.columns, 2)]
However, I could't get proper Dataframe that I need.
I want not only combination of columns in same row, but also price&ID in that row. Also, I want to delete a row with combination with NaN Is there any way?
Thanks in advance!
CodePudding user response:
Use:
df = (df.set_index(['price','id'])
.stack()
.groupby(level=[0,1])
.apply(lambda x: pd.DataFrame(itertools.combinations(x, 2),
columns=['Target','Source']))
.reset_index(level=-1, drop=True)
.reset_index()[['Target','Source','price','id']])
print (df)
Target Source price id
0 Lime Lemon 80 1
1 Lime Grape 80 1
2 Lime Grass 80 1
3 Lemon Grape 80 1
4 Lemon Grass 80 1
.. ... ... ... ..
59 Grape Fig 84 4
60 Grass Plum 84 4
61 Grass Fig 84 4
62 Plum Fig 84 4
63 Lime Peach 89 2
[64 rows x 4 columns]