I'm working with a fairly large DataFrame that has multiple columns. It looks something like this:
Date | Temp | Dewpt_Temp | Rainfall (cm) | Snowfall (cm) |
---|---|---|---|---|
12/16/2021 | -1.6 | -5.4 | 0 | 6.7 |
12/17/2021 | -5.5 | -12.4 | 0 | 0 |
.......... | .... | .......... | ............. | ............. |
I have formulas I want to apply to the DataFrame to calculate new variables, those being saturation vapor pressure, vapor pressure, and relative humidity. Here is my code:
data = pd.read_csv('file path/weather_data.csv')
def new_vars(dataframe):
temp = dataframe.Temp
dewpt = dataframe.Dewpt_Temp
e = 6.11*(10**((7.5*dewpt)/(273.3 dewpt)))
e_s = 6.11*(10**((7.5*temp)/(273.3 temp)))
rh = (e/e_s) * 100
return (e, e_s, rh)
new_df = data.apply(lambda x: new_vars(data), axis=1)
The code seems to work; however, when I run it, it seems to only compute the new variables using the last row in the DataFrame. The amount of output rows matches what the original DataFrame size is, but the new variable values calculated are all the same for each of the rows, seemingly using only the last row of data from the original DataFrame. Am I missing something that is needed to prevent this from happening?
I know there are probably simpler ways of calculating the new variables given its in a DataFrame, but I have more complex equations that I will need to use in the future, so I wanted to get practice using a user defined function.
CodePudding user response:
Can you try this:
new_df = pd.DataFrame()
new_df[['e', 'e_s', 'rh']] = df.apply(lambda x: new_vars(x),axis=1)
Full code:
data = pd.read_csv('file path/weather_data.csv')
def new_vars(dataframe):
temp = dataframe.Temp
dewpt = dataframe.Dewpt_Temp
e = 6.11*(10**((7.5*dewpt)/(273.3 dewpt)))
e_s = 6.11*(10**((7.5*temp)/(273.3 temp)))
rh = (e/e_s) * 100
return (e, e_s, rh)
new_df = pd.DataFrame()
new_df[['e', 'e_s', 'rh']] = df.apply(lambda x: new_vars(x),axis=1)
CodePudding user response:
Try this:
new_df[['e', 'e_s', 'rh']] = data.apply(lambda x: new_vars(x['Temp'], x['Dewpt_Temp']), axis=1)
And in the function declaration:
def new_vars(temp, dewpt)
And delete these two lines:
temp = dataframe.Temp
dewpt = dataframe.Dewpt_Temp