Home > Software engineering >  python replace cols values in dataframe given a secondary df. # Check for duplicates KeyError: [dupl
python replace cols values in dataframe given a secondary df. # Check for duplicates KeyError: [dupl

Time:09-28

Given df1 and df2 dataframes, I would like to substitute the 'Name' items in df2 for their corresponding 'Area' values in df1.

In other words, the values in df2['Name] should be respectively A1 A3 A2 A1 A2

import pandas as pd

df1 = pd.DataFrame({'Name':['Tom', 'nick', 'Albert', 'Josh'],
            'Area':['A1','A2','A3','A1']
                       })
df2 = pd.DataFrame({'Name':['Tom', 'Albert', 'Nick', 'Josh','Nick'],
            'pet':['dog','cat','mouse','dog','mouse']
                       })

After searching for some suggestions, I added this line:

df = df2.merge(df1, on=['Area','Name'], how='left')

print(df)

But I get this error:

# Check for duplicates

KeyError: 'Area'

and this line:

df = df2['Name'].map(df1.set_index('Area')['Name'])
print(df)

outputs this error

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

CodePudding user response:

That's because both dataframes should have a column named Area.

So remove the Area when merging.

df = df2.merge(df1, on=['Name'], how='left')

CodePudding user response:

You do not have any column named 'Area' in your second dataframe (df2). This is why the error is being shown.

CodePudding user response:

You can use dict and zip commands to make the 2 dataframe columns you want to map into a dictionary. Then you can map the values easily.

import pandas as pd

df1 = pd.DataFrame({
     'Name':['Tom', 'nick', 'Albert', 'Josh'],
     'Area':['A1','A2','A3','A1']
})

df2 = pd.DataFrame({
    'Name':['Tom', 'Albert', 'Nick', 'Josh','Nick'],
    'pet':['dog','cat','mouse','dog','mouse']
})

df2['Name'] = df2['Name'].map(dict(zip(df1['Name'], df1['Area'])))
  • Related