Home > OS >  Pandas: Compare value for a column by each row between 2 dataframes and replace value in another col
Pandas: Compare value for a column by each row between 2 dataframes and replace value in another col

Time:06-22

I have 2 dataframes:

DF1:

    SICCode Industry Title
0   4941    AGRICULTURAL PRODUCTION-CROPS
1   7374    AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
2   5065    AGRICULTURAL SERVICES
3   4941    FORESTRY
4   5122    FISHING, HUNTING AND TRAPPING

DF2:

    SICCode Sector
0   4941    Buseq
1   7374    utils
2   5065    shops
3   4941    utils
4   5122    Buseq

DF1 is the master; I want to check for each SIC Code in DF1 in SIC Code in DF2 and if it is there, I want to replace the value in Sector column by the corresponding value under Industry Title. For e.g., for 4941 in DF1, it should against all values in SIC Code in DF2 and replace Buseq and utils with "Agricultural Production crops".

I tried .map, .isin, if-else, etc. but not able to move forward. Any suggestions please?

CodePudding user response:

Maybe a bit crude, but works.

df2 = df2.merge(df1, on="SICode", how="left")
df2.drop(['Sector'], inplace=True, axis=1)

CodePudding user response:

BE CAREFUL, in your master df SICCode is not a primary key, in the example your show there is two rows with the same code which could be a problem, be aware of that (I have arbitrary edited the SICCode of Forestry in the solution but if you don't Buseq and utils would be replaced by Forestry)

I see 2 solution to your issue:

  • Using only pandas:
import pandas as pd

df1 = {'SICCode':[4941,7374,5065,4944,5122], \
       'Industry_Title' :['AGRICULTURAL PRODUCTION-CROPS',\
                          'AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES',\
                          'AGRICULTURAL SERVICES','FORESTRY',\
                          'FISHING  HUNTING AND TRAPPING']}
df2 =  {  'SICCode':[4941,7374,5065,4941,5122], 'Sector': ['Buseq','utils','shops','utils','Buseq']}

df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)

for index in df2.index:

    to_replace = df1[df1['SICCode'] == df2.loc[index,'SICCode']]
    for i in to_replace['Industry_Title'].index:
        df2.loc[index,'Sector'] = to_replace['Industry_Title'][i]
  • Using sqlite3:
import pandas as pd
import sqlite3 as sql

df1 = {'SICCode':[4941,7374,5065,4944,5122], \
       'Industry_Title' :['AGRICULTURAL PRODUCTION-CROPS',\
                          'AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES',\
                          'AGRICULTURAL SERVICES','FORESTRY',\
                          'FISHING  HUNTING AND TRAPPING']}
df2 =  {  'SICCode':[4941,7374,5065,4941,5122], 'Sector': ['Buseq','utils','shops','utils','Buseq']}

df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)  

conn = sql.connect('test_database.db')
c = conn.cursor()

c.execute('CREATE TABLE IF NOT EXISTS df1 (SICCode Integer, Industry_Title Text)')
c.execute('CREATE TABLE IF NOT EXISTS df2 (SICCode Integer, Sector Text)')


df1.to_sql('df1', conn, if_exists='replace', index = False)
df2.to_sql('df2', conn, if_exists='replace', index = False)
# create the database witht the table

to_replace = c.execute("""
          Select df2.SICCode,Industry_Title From
          df1 Join df2 On df1.SICCode=df2.SICCode
          """).fetchall()
# select rows with same SICCode in df1 and df2

for x in to_replace:
    c.execute("""
              Update df2 Set Sector = '{}'
              Where SICCode = {}
              """.format(x[1],x[0]))
    #updates the row of df2

conn.commit()
conn.close()
  • Related