Home > Back-end >  Find source of Key Error in pandas dataframe merge
Find source of Key Error in pandas dataframe merge

Time:07-19

I am getting a KeyError: 'Cust_id_2' when I try to merge the following dataframes.

df =

Cust_id year    is_sub
0   4   1516    is_sub
1   4   1920    is_sub
2   4   1819    is_sub
3   4   1718    is_sub
4   4   1617    is_sub

df2 =

    Cust_id_2   year_freq_score
0   4   9.0
1   5   6.0
2   7   10.0
3   8   2.0
4   10  1.0

Most recently I have tried this:

result = pd.merge(
    df, 
    df2[['year_freq_score']],
    how='left',
    left_on='Cust_id',
    right_on='Cust_id_2'
)

df has 14,000 rows. df2 has 3,000 rows. df2 is a pivot table derived from df.

My first version had the Cust_id as the index of df2 and i tried to use 'right_index=True' which gave a KeyError.

I then reset the index and used 'on' columns having the same name (on='Cust_id) which gave KeyError: 'Cust_id'.

I then and changed df2 to ''Cust_id_2'' to isolate where the error was coming from and now receive 'KeyError: 'Cust_id_2''.

I've read through multiple posts on 'KeyError' but have not found (or understood) the solution to this issue.

Any help or pointers in the right direction greatly appreciated.

CodePudding user response:

You slice df2 to only keep year_freq_score, so there is no more Cust_id_2 column for the merge.

Do instead:

result = pd.merge(
    df, 
    df2,
    how='left',
    left_on='Cust_id',
    right_on='Cust_id_2'
)
  • Related