I have a large dataset split into three dfs.
df1 contains grided (lat-long) and vegetation type data.
df1:
col row Type TypeNo
-125 42.5 BF 4
-125 42.5 MTF 8
-125 42.5 TCF 9
-125 42.5 TDF 10
-125 42.5 SG 13
-125 42.5 AS 15
-125 42.5 XFW 19
-125 42.5 TBEF 33
-125 32.5 BF 4
-125 32.5 MTF 8
-125 32.5 TCF 9
-125 32.5 TDF 10
-125 32.5 SG 13
-125 32.5 AS 15
-125 32.5 XFW 19
-125 32.5 TBEF 33
df2 contains temperature data for each grid.
df2:
col row Tmin Tmax
-125 42.5 7.1 36.7
-125 32.5 7.1 16.7
df3 contains temperature limits for each vegetation type in df1.
df3:
Type TypeNo limit_Tmin limit_Tmax
BF 4 -60 21
MTF 8 -5 23
TCF 9 -40 10
TDF 10 -30 30
SG 13 -20 38
AS 15 -20 38
XFW 19 -40 15
TBEF 33 -40 20
I am trying to apply the temperature limits to df1 such that for every vegetation type:
if (df2: Tmax > df3: limit_Tmax) then vegetation is False, else, True
This has to be done in each grid.
An example output would be df1 and a new column with True/False, i.e.:
col row Type TypeNo Possible
-125 42.5 BF 4 False
-125 42.5 MTF 8 False
-125 42.5 TCF 9 False
-125 42.5 TDF 10 False
-125 42.5 SG 13 True
-125 42.5 AS 15 True
-125 42.5 XFW 19 False
-125 42.5 TBEF 33 False
-125 32.5 BF 4 True
-125 32.5 MTF 8 True
-125 32.5 TCF 9 False
-125 32.5 TDF 10 True
-125 32.5 SG 13 True
-125 32.5 AS 15 True
-125 32.5 XFW 19 False
-125 32.5 TBEF 33 True
I am new pandas and need help on how to do this. Thank you!
CodePudding user response:
First we use merge
to put all the data into a new data frame, then we define a new column for df1 based off the values of the new dataframe.
#pd is pandas
import pandas as pd
#Merge df1 and df2
dft = pd.merge(df1, df2, on=["col", "row"], how="left")
#Merge result of that with df3
dft = pd.merge(dft, df3, on=["Type", "TypeNo"], how="left")
#Create new column and put it in df1
df1["Possible"] = dft["Tmax"] <= dft["limit_Tmax"]
CodePudding user response:
just bunch of simple joins to get your desired result:
df_data = df1.merge(df3 , on=[ 'Type', 'TypeNo'],how='left')
df_data = df_data.merge(df2 , on=[ 'col', 'row'],how='left')
df1['possible'] = df_data['Tmax'] <= df_data['limit_Tmax']
print(df1)
output:
>>>
col row Type TypeNo possible
0 -125 42.5 BF 4 False
1 -125 42.5 MTF 8 False
2 -125 42.5 TCF 9 False
3 -125 42.5 TDF 10 False
4 -125 42.5 SG 13 True
5 -125 42.5 AS 15 True
6 -125 42.5 XFW 19 False
7 -125 42.5 TBEF 33 False
8 -125 32.5 BF 4 True
9 -125 32.5 MTF 8 True
10 -125 32.5 TCF 9 False
11 -125 32.5 TDF 10 True
12 -125 32.5 SG 13 True
13 -125 32.5 AS 15 True
14 -125 32.5 XFW 19 False
15 -125 32.5 TBEF 33 True