Can someone please show me how to merge df2 to df1 for just the matching cities, then use the df2's average monthly temperature columns to match to each city's date range (according to the month) into a new column called 'Temp' in df1?
These are sample data of much larger files for state and cities in Brazil.
df1
State City Dates
0 AC Rio Branco 3/20/2020
1 BA Salvador 5/2/2020
2 CE Fortaleza 4/6/2020
3 AC Rio Branco 5/30/2020
df2: has average monthly temperatures for each city.
State City MAR APR MAY
0 CE Fortaleza 75.6 72.7 69.4
1 ES Vitória 69.1 64.6 62.7
2 AC Rio Branco 72.8 70.5 68.9
3 BA Salvador 74.6 71.3 70.1
Desired output: df1 with new column 'Temp'
State City Dates Temp
0 AC Rio Branco 3/20/2020 72.8
1 BA Salvador 5/2/2020 70.1
2 CE Fortaleza 4/6/2020 72.7
3 AC Rio Branco 5/30/2020 68.9
CodePudding user response:
You should first convert the date field to datetime, if it is not. Then you can extract the month in MON format so that it matches the column names. The Temp column can be created by checking the Month column and assigning the value from the appropriate column. Finally remove the interim columns created. Hope this is what you are looking for.
import datetime
import calendar
df1=pd.DataFrame({'State': ['AC', 'BA', 'CE', 'AC'], 'City':['Rio Branco', 'Salvador', 'Fortaleza', 'Rio Branco'],
'Dates':['3/20/2020', '5/2/2020', '4/6/2020', '5/30/2020']})
df2=pd.DataFrame({'State': ['CE', 'ES', 'AC', 'BA'], 'City':['Fortaleza', 'Vitória', 'Rio Branco', 'Salvador'],
'MAR' : [75.6, 69.1, 72.8, 74.6], 'APR' : [72.7, 64.6, 70.5, 71.3], 'MAY': [69.4, 62.7, 68.9, 70.1]})
df1['Dates']=pd.to_datetime(df1['Dates']) ##Convert to datetime
df1 = pd.merge(df1,df2, on='City', how="inner") ##Merge the dfs using City as the primary key
df1['Month']=df1.Dates.dt.month.apply(lambda x: calendar.month_abbr[x]).str.upper() ## Get MON for each date
df1['Temp']=np.where(df1['Month'] == 'MAR', df1['MAR'], np.where(df1['Month']=='APR', df1['APR'], df1['MAY'])) ## Add Temp value
df1=df1.drop(columns=['State_y', 'MAR', 'APR', 'MAY', 'Month']).rename(columns={'State_x':'State'}) #Drop unnecessary columns
print(df1)
Output
State City Dates Temp
0 AC Rio Branco 2020-03-20 72.8
1 AC Rio Branco 2020-05-30 68.9
2 BA Salvador 2020-05-02 70.1
3 CE Fortaleza 2020-04-06 72.7
CodePudding user response:
You can use a merge
after reshaping df2
to long form with melt
and extracting the month abbreviation with to_datetime
and strftime
:
(df1.assign(month=pd.to_datetime(df1['Dates']).dt.strftime('%b').str.upper())
.merge(df2.melt(['State', 'City'], var_name='month', value_name='Temp'),
on=['State', 'City', 'month'])
#.drop(columns='month') # uncomment to remove the column
)
output:
State City Dates month Temp
0 AC Rio Branco 3/20/2020 MAR 72.8
1 BA Salvador 5/2/2020 MAY 70.1
2 CE Fortaleza 4/6/2020 APR 72.7
3 AC Rio Branco 5/30/2020 MAY 68.9
CodePudding user response:
you can use a function:
df1['Dates']=pd.to_datetime(df1['Dates'])
df1['Month'] = df1['Dates'].dt.strftime('%b').str.upper()
final=df1.merge(df2,on='State')
def get_value(x,row):
ort=final[x].iat[row]
return ort
final['Temp']=final.apply(lambda x: get_value(x['Month'],row=x.name),axis=1)
#now rename columns and format date
final=final[['State','City_x','Dates','Temp']]
final.columns=['State','City','Dates','Temp']
final['Dates'] = final['Dates'].dt.strftime('%d/%m/%Y')
print(final)
'''
State City Dates Temp
0 AC Rio Branco 20/03/2020 72.8
1 AC Rio Branco 30/05/2020 68.9
2 BA Salvador 02/05/2020 70.1
3 CE Fortaleza 06/04/2020 72.7
'''