I have a problem. I want to merge two dataframes. The one df_neighbourhood_one
is a one hot encoding matrix and the df
is the normal dataframe. The Problem is when I merge the two dataframes I only get 5 rows, but there should be 7.
d = {'host_id': [1, 1, 2, 3, 4, 5, 6],
'id': [10, 11, 20, 30, 40, 50, 60],
'neighbourhood': ["Centrum", 'West', 'West', 'North', 'Centrum', 'West', 'West'],
'something': ['some1', 'some2', 'some3', 'some4', 'some5', 'some6', 'some7']}
df = pd.DataFrame(data=d)
print(df)
print()
df_neighbourhood = df.copy()
df_neighbourhood = df_neighbourhood.set_index(['host_id', 'id'])
df_neighbourhood_one = pd.get_dummies(df_neighbourhood['neighbourhood']).head()
print(df_neighbourhood_one)
df = pd.merge(df, df_neighbourhood_one, right_index=True, left_on=['host_id', 'id'])
print(df)
print(df.shape)
# df
host_id id neighbourhood something
0 1 10 Centrum some1
1 1 11 West some2
2 2 20 West some3
3 3 30 North some4
4 4 40 Centrum some5
5 5 50 West some6
6 6 60 West some7
# df_neighbourhood_one
Centrum North West
host_id id
1 10 1 0 0
11 0 0 1
2 20 0 0 1
3 30 0 1 0
4 40 1 0 0
# df but merged
host_id id neighbourhood something Centrum North West
0 1 10 Centrum some1 1 0 0
1 1 11 West some2 0 0 1
2 2 20 West some3 0 0 1
3 3 30 North some4 0 1 0
4 4 40 Centrum some5 1 0 0
(5, 7) # df shape
What I want
host_id id neighbourhood something Centrum North West
0 1 10 Centrum some1 1 0 0
1 1 11 West some2 0 0 1
2 2 20 West some3 0 0 1
3 3 30 North some4 0 1 0
4 4 40 Centrum some5 1 0 0
5 5 50 West some6 0 0 1
6 6 60 West some7 0 0 1
CodePudding user response:
Just use how='left'
with fillna
.
Also, you are using head
while creating df_neighbourhood_one
. Don't use that.
In [734]: df_neighbourhood_one = pd.get_dummies(df_neighbourhood['neighbourhood'])
In [735]: pd.merge(df, df_neighbourhood_one, right_index=True, left_on=['host_id', 'id'], how='left').fillna(0)
Out[735]:
host_id id neighbourhood something Centrum North West
0 1 10 Centrum some1 1 0 0
1 1 11 West some2 0 0 1
2 2 20 West some3 0 0 1
3 3 30 North some4 0 1 0
4 4 40 Centrum some5 1 0 0
5 5 50 West some6 0 0 1
6 6 60 West some7 0 0 1