Home > OS >  Sensibly merging two dataframes
Sensibly merging two dataframes

Time:12-06

If one of my dataframes gives me some info about items:

    itemId     property_1      property_2     property_n       Decision
 0      i1          88.90             NaN              0              1
 1      i2          87.09    7.653800e 06              0              0
 2      i3          78.90    7.623800e 06              1              1
 3      i4          93.02             NaN              1              0
 ...

And the other one gives me some info about how users interacted with the items:

     userId        itemId      Decision
  0      u1            i1             0
  1      u1            i2             1
  2      u2            i1             1
  3      u2            i3             0
  4      u2            i4             1
  5      u3            i5             0
    ...

I am interested in predicting the Decision, which is easy to do if I work with each dataframe, separately. But can I somehow incorporate the second one into the first one, given that in the second one, each item appears multiple times with different Decisions?

I would like to have something like:

    itemId     property_1      property_2     property_n     u1_decision  ...    Decision
  0     i1          88.90             NaN              0               0               1
  1     i2          87.09    7.653800e 06              0               1               0
  2     i3          78.90    7.623800e 06              1             NaN               1
  4     i4          93.02             NaN              1             NaN               0  
   ...

So each user becomes a column, result in something very sparse. The first question would be whether this makes sense, and the second question would be how do I merge the rows from the second dataframe as columns into the first one (I know how to df.merge on Decision, but this doesn't give me the desired result).

CodePudding user response:

You can pivot the second table like:

df.pivot(index='itemId', columns='userId', values='Decision').reset_index()

Then you can do the merge on itemId.

  • Related