Home > Blockchain >  Merge deliver only 5 rows instead of 7
Merge deliver only 5 rows instead of 7

Time:11-18

I have a problem. I want to merge two dataframes. The one df_neighbourhood_one is a one hot encoding matrix and the df is the normal dataframe. The Problem is when I merge the two dataframes I only get 5 rows, but there should be 7.

d = {'host_id': [1, 1, 2, 3, 4, 5, 6],
     'id': [10, 11, 20, 30, 40, 50, 60],
     'neighbourhood': ["Centrum", 'West', 'West', 'North', 'Centrum', 'West', 'West'],
     'something': ['some1', 'some2', 'some3', 'some4', 'some5', 'some6', 'some7']}
df = pd.DataFrame(data=d)
print(df)
print()
df_neighbourhood = df.copy()
df_neighbourhood = df_neighbourhood.set_index(['host_id', 'id'])
df_neighbourhood_one = pd.get_dummies(df_neighbourhood['neighbourhood']).head()
print(df_neighbourhood_one)

df = pd.merge(df, df_neighbourhood_one,  right_index=True, left_on=['host_id', 'id'])
print(df)
print(df.shape)

# df
   host_id  id neighbourhood something
0        1  10       Centrum     some1
1        1  11          West     some2
2        2  20          West     some3
3        3  30         North     some4
4        4  40       Centrum     some5
5        5  50          West     some6
6        6  60          West     some7
# df_neighbourhood_one
            Centrum  North  West
host_id id                      
1       10        1      0     0
        11        0      0     1
2       20        0      0     1
3       30        0      1     0
4       40        1      0     0
# df but merged
   host_id  id neighbourhood something  Centrum  North  West
0        1  10       Centrum     some1        1      0     0
1        1  11          West     some2        0      0     1
2        2  20          West     some3        0      0     1
3        3  30         North     some4        0      1     0
4        4  40       Centrum     some5        1      0     0
(5, 7) # df shape

What I want

   host_id  id neighbourhood something  Centrum  North  West
0        1  10       Centrum     some1        1      0     0
1        1  11          West     some2        0      0     1
2        2  20          West     some3        0      0     1
3        3  30         North     some4        0      1     0
4        4  40       Centrum     some5        1      0     0
5        5  50          West     some6        0      0     1
6        6  60          West     some7        0      0     1

CodePudding user response:

Just use how='left' with fillna.

Also, you are using head while creating df_neighbourhood_one. Don't use that.

In [734]: df_neighbourhood_one = pd.get_dummies(df_neighbourhood['neighbourhood'])

In [735]: pd.merge(df, df_neighbourhood_one,  right_index=True, left_on=['host_id', 'id'], how='left').fillna(0)
Out[735]: 
   host_id  id neighbourhood something  Centrum  North  West
0        1  10       Centrum     some1        1      0     0
1        1  11          West     some2        0      0     1
2        2  20          West     some3        0      0     1
3        3  30         North     some4        0      1     0
4        4  40       Centrum     some5        1      0     0
5        5  50          West     some6        0      0     1
6        6  60          West     some7        0      0     1
  • Related