I have a pandas dataframe like
user_id | music_id | has_rating |
---|---|---|
A | a | 1 |
B | b | 1 |
and I would like to automatically add new rows for each of user_id & music_id for those users haven't rated, like
user_id | music_id | has_rating |
---|---|---|
A | a | 1 |
A | b | 0 |
B | a | 0 |
B | b | 1 |
for each of user_id and music_id combination pairs those are not existing in my Pandas dataframe yet.
is there any way to append such rows automatically like this?
CodePudding user response:
Try using pd.MultiIndex.from_product()
l = ['user_id','music_id']
(df.set_index(l)
.reindex(pd.MultiIndex.from_product([df[l[0]].unique(),df[l[1]].unique()],names = l),fill_value=0)
.reset_index())
Output:
user_id music_id has_rating
0 A a 1
1 A b 0
2 B a 0
3 B b 1
CodePudding user response:
You can use a temporary reshape with pivot_table
and fill_value=0
to fill the missing values with 0
:
(df.pivot_table(index='user_id', columns='music_id',
values='has_rating', fill_value=0)
.stack().reset_index(name='has_rating')
)
Output:
user_id music_id has_rating
0 A a 1
1 A b 0
2 B a 0
3 B b 1