I have 2 dataframes:
DF1:
SICCode Industry Title
0 4941 AGRICULTURAL PRODUCTION-CROPS
1 7374 AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES
2 5065 AGRICULTURAL SERVICES
3 4941 FORESTRY
4 5122 FISHING, HUNTING AND TRAPPING
DF2:
SICCode Sector
0 4941 Buseq
1 7374 utils
2 5065 shops
3 4941 utils
4 5122 Buseq
DF1 is the master; I want to check for each SIC Code in DF1 in SIC Code in DF2 and if it is there, I want to replace the value in Sector column by the corresponding value under Industry Title. For e.g., for 4941 in DF1, it should against all values in SIC Code in DF2 and replace Buseq and utils with "Agricultural Production crops".
I tried .map, .isin, if-else, etc. but not able to move forward. Any suggestions please?
CodePudding user response:
Maybe a bit crude, but works.
df2 = df2.merge(df1, on="SICode", how="left")
df2.drop(['Sector'], inplace=True, axis=1)
CodePudding user response:
BE CAREFUL, in your master df SICCode is not a primary key, in the example your show there is two rows with the same code which could be a problem, be aware of that (I have arbitrary edited the SICCode of Forestry in the solution but if you don't Buseq and utils would be replaced by Forestry)
I see 2 solution to your issue:
- Using only pandas:
import pandas as pd
df1 = {'SICCode':[4941,7374,5065,4944,5122], \
'Industry_Title' :['AGRICULTURAL PRODUCTION-CROPS',\
'AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES',\
'AGRICULTURAL SERVICES','FORESTRY',\
'FISHING HUNTING AND TRAPPING']}
df2 = { 'SICCode':[4941,7374,5065,4941,5122], 'Sector': ['Buseq','utils','shops','utils','Buseq']}
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)
for index in df2.index:
to_replace = df1[df1['SICCode'] == df2.loc[index,'SICCode']]
for i in to_replace['Industry_Title'].index:
df2.loc[index,'Sector'] = to_replace['Industry_Title'][i]
- Using sqlite3:
import pandas as pd
import sqlite3 as sql
df1 = {'SICCode':[4941,7374,5065,4944,5122], \
'Industry_Title' :['AGRICULTURAL PRODUCTION-CROPS',\
'AGRICULTURAL PROD-LIVESTOCK & ANIMAL SPECIALTIES',\
'AGRICULTURAL SERVICES','FORESTRY',\
'FISHING HUNTING AND TRAPPING']}
df2 = { 'SICCode':[4941,7374,5065,4941,5122], 'Sector': ['Buseq','utils','shops','utils','Buseq']}
df1 = pd.DataFrame(df1)
df2 = pd.DataFrame(df2)
conn = sql.connect('test_database.db')
c = conn.cursor()
c.execute('CREATE TABLE IF NOT EXISTS df1 (SICCode Integer, Industry_Title Text)')
c.execute('CREATE TABLE IF NOT EXISTS df2 (SICCode Integer, Sector Text)')
df1.to_sql('df1', conn, if_exists='replace', index = False)
df2.to_sql('df2', conn, if_exists='replace', index = False)
# create the database witht the table
to_replace = c.execute("""
Select df2.SICCode,Industry_Title From
df1 Join df2 On df1.SICCode=df2.SICCode
""").fetchall()
# select rows with same SICCode in df1 and df2
for x in to_replace:
c.execute("""
Update df2 Set Sector = '{}'
Where SICCode = {}
""".format(x[1],x[0]))
#updates the row of df2
conn.commit()
conn.close()