Python - How to merge column values from one df to match rows in another df?-CodePudding

Can someone please show me how to merge df2 to df1 for just the matching cities, then use the df2's average monthly temperature columns to match to each city's date range (according to the month) into a new column called 'Temp' in df1?

These are sample data of much larger files for state and cities in Brazil.

df1

      State   City         Dates
0       AC    Rio Branco   3/20/2020
1       BA    Salvador     5/2/2020
2       CE    Fortaleza    4/6/2020
3       AC    Rio Branco   5/30/2020

df2: has average monthly temperatures for each city.

      State   City         MAR   APR  MAY  
0       CE    Fortaleza    75.6  72.7 69.4
1       ES    Vitória      69.1  64.6 62.7
2       AC    Rio Branco   72.8  70.5 68.9
3       BA    Salvador     74.6  71.3 70.1

Desired output: df1 with new column 'Temp'

      State   City         Dates      Temp
0       AC    Rio Branco   3/20/2020  72.8
1       BA    Salvador     5/2/2020   70.1
2       CE    Fortaleza    4/6/2020   72.7
3       AC    Rio Branco   5/30/2020  68.9

CodePudding user response：

You should first convert the date field to datetime, if it is not. Then you can extract the month in MON format so that it matches the column names. The Temp column can be created by checking the Month column and assigning the value from the appropriate column. Finally remove the interim columns created. Hope this is what you are looking for.

import datetime
import calendar
df1=pd.DataFrame({'State': ['AC', 'BA', 'CE', 'AC'], 'City':['Rio Branco', 'Salvador', 'Fortaleza', 'Rio Branco'], 
                  'Dates':['3/20/2020', '5/2/2020', '4/6/2020', '5/30/2020']})
df2=pd.DataFrame({'State': ['CE', 'ES', 'AC', 'BA'], 'City':['Fortaleza', 'Vitória', 'Rio Branco', 'Salvador'], 
                  'MAR' : [75.6, 69.1, 72.8, 74.6], 'APR' : [72.7, 64.6, 70.5, 71.3], 'MAY': [69.4, 62.7, 68.9, 70.1]})

df1['Dates']=pd.to_datetime(df1['Dates']) ##Convert to datetime
df1 = pd.merge(df1,df2, on='City', how="inner") ##Merge the dfs using City as the primary key
df1['Month']=df1.Dates.dt.month.apply(lambda x: calendar.month_abbr[x]).str.upper() ## Get MON for each date
df1['Temp']=np.where(df1['Month'] == 'MAR', df1['MAR'], np.where(df1['Month']=='APR', df1['APR'], df1['MAY'])) ## Add Temp value
df1=df1.drop(columns=['State_y', 'MAR', 'APR', 'MAY', 'Month']).rename(columns={'State_x':'State'}) #Drop unnecessary columns
print(df1)

Output

    State   City    Dates   Temp
0   AC  Rio Branco  2020-03-20  72.8
1   AC  Rio Branco  2020-05-30  68.9
2   BA  Salvador    2020-05-02  70.1
3   CE  Fortaleza   2020-04-06  72.7

CodePudding user response：

You can use a merge after reshaping df2 to long form with melt and extracting the month abbreviation with to_datetime and strftime:

(df1.assign(month=pd.to_datetime(df1['Dates']).dt.strftime('%b').str.upper())
    .merge(df2.melt(['State', 'City'], var_name='month', value_name='Temp'),
           on=['State', 'City', 'month'])
   #.drop(columns='month') # uncomment to remove the column
)

output:

  State        City      Dates month  Temp
0    AC  Rio Branco  3/20/2020   MAR  72.8
1    BA    Salvador   5/2/2020   MAY  70.1
2    CE   Fortaleza   4/6/2020   APR  72.7
3    AC  Rio Branco  5/30/2020   MAY  68.9

CodePudding user response：

you can use a function:

df1['Dates']=pd.to_datetime(df1['Dates'])
df1['Month'] = df1['Dates'].dt.strftime('%b').str.upper()
final=df1.merge(df2,on='State')

def get_value(x,row):
    ort=final[x].iat[row]
    return ort

final['Temp']=final.apply(lambda x: get_value(x['Month'],row=x.name),axis=1)

#now rename columns and format date
final=final[['State','City_x','Dates','Temp']]
final.columns=['State','City','Dates','Temp']
final['Dates'] = final['Dates'].dt.strftime('%d/%m/%Y')
print(final)
'''
    State   City        Dates       Temp
0   AC      Rio Branco  20/03/2020  72.8
1   AC      Rio Branco  30/05/2020  68.9
2   BA      Salvador    02/05/2020  70.1
3   CE      Fortaleza   06/04/2020  72.7

'''