If one of my dataframes gives me some info about items:
itemId property_1 property_2 property_n Decision
0 i1 88.90 NaN 0 1
1 i2 87.09 7.653800e 06 0 0
2 i3 78.90 7.623800e 06 1 1
3 i4 93.02 NaN 1 0
...
And the other one gives me some info about how users interacted with the items:
userId itemId Decision
0 u1 i1 0
1 u1 i2 1
2 u2 i1 1
3 u2 i3 0
4 u2 i4 1
5 u3 i5 0
...
I am interested in predicting the Decision
, which is easy to do if I work with each dataframe, separately. But can I somehow incorporate the second one into the first one, given that in the second one, each item
appears multiple times with different Decisions
?
I would like to have something like:
itemId property_1 property_2 property_n u1_decision ... Decision
0 i1 88.90 NaN 0 0 1
1 i2 87.09 7.653800e 06 0 1 0
2 i3 78.90 7.623800e 06 1 NaN 1
4 i4 93.02 NaN 1 NaN 0
...
So each user becomes a column, result in something very sparse. The first question would be whether this makes sense, and the second question would be how do I merge the rows from the second dataframe as columns into the first one (I know how to df.merge
on Decision
, but this doesn't give me the desired result).
CodePudding user response:
You can pivot
the second table like:
df.pivot(index='itemId', columns='userId', values='Decision').reset_index()
Then you can do the merge
on itemId
.