Home > Software design >  Merge two dataframes with id
Merge two dataframes with id

Time:11-04

I want to merge two dataframes. But when I do the following I got KeyError: "['available'] not in index". I looked at Python Pandas merge only certain columns . But what I'm doing wrong?

d = { 'listing_id': [1,2,3,4],
      'month': [1, 2, 3, 4],
      'price': [79.00, 80.00, 90.00, 20.00]}
df = pd.DataFrame(data=d)

d2 = {'id': [1, 2, 3, 4],
     'available': [5000, 8000,5000, 7000],
     'someotherstuff': [2,3,4,5]}
df2 = pd.DataFrame(data=d2)

df = pd.merge(df,df2[['id','available']],on='listing_id', how='left')

What I want

  listing_id  month  price   available 
0           1      1   79.0  5000
1           2      2   80.0  8000
2           3      3   90.0  5000
3           4      4   20.0  7000

CodePudding user response:

You are telling pandas to merge, on = 'listing_id', but do not have a listing_id in df2.

Change the id to listing_id and this should work. Also, no need to specify what columns you want to merge (no need for df2[['id','available']].

d = { 'listing_id': [1,2,3,4],
      'month': [1, 2, 3, 4],
      'price': [79.00, 80.00, 90.00, 20.00]}
df = pd.DataFrame(data=d)
print(df['listing_id'])

d2 = {'listing_id': [1, 2, 3, 4],
     'available ': [5000, 8000,5000, 7000]}
df2 = pd.DataFrame(data=d2)

df = pd.merge(df,df2,on = 'listing_id', how='left')
print(df)

The output:

Name: listing_id, dtype: int64
   listing_id  month  price  available 
0           1      1   79.0        5000
1           2      2   80.0        8000
2           3      3   90.0        5000
3           4      4   20.0        7000

CodePudding user response:

Your solution won't work because your ID columns have different names. Try this:

df = pd.merge(df, df2, left_on='listing_id', right_on='id')

CodePudding user response:

Firstly there is an extra space in your column available, so strip that out.

df2.columns
Out[10]: Index(['id', 'available '], dtype='object')

df2.columns = [col.strip() for col in df2.columns]
Out[15]: Index(['id', 'available'], dtype='object')

Then, your column that you want the merge to happen on is called differently in the two dataframes, so you need to specify left_on = and right_on = in the merge command:

pd.merge(df,df2[['id','available']],left_on='listing_id', right_on = 'id',how='left').drop('id',axis=1)

   listing_id  month  price  available
0           1      1   79.0       5000
1           2      2   80.0       8000
2           3      3   90.0       5000
3           4      4   20.0       7000
  • Related