I am trying to evaluate dataset of temperature that i extracted from GCM against my observed data. I used the same exact script for precipitation files as well and it worked well. But now when I run that same script for Temp it gives me error. The format in which i prepared my input files are exactly same as precipitation files. So it should work... The error is as following:
"TypeError: simulations array must contain numerical values"
All my input files aka Simulated file , Station file and output file which I am supposed to get are in .CSV format.
I am sharing the script below. Please have a look and help me out. and my simulated and observed files are here:https://drive.google.com/drive/folders/1u5kgCSVbReDzv1bgh1l_YJjmv1iwatlh?usp=sharing
Edit : The script is quite long so i am sharing the concerned parts. Plz lemme know if its not clear enough.
for file in os.listdir("input/sim"):
if file.endswith(".csv"):
simulated_data=pd.read_csv(os.path.join("input/sim", file))
simulated_data['Date']=pd.to_datetime(simulated_data['Date'])
simulated_data.index=simulated_data['Date']
simulated_data.drop(['Date'],axis=1,inplace=True)
simulated_data.index
for s in lat_lon['Stations']:
Ob_data=pd.DataFrame(Observed_data[str(s)])
sim_data=pd.DataFrame(simulated_data[str(s)])
for_cor=pd.concat([Ob_data,sim_data],axis=1,copy=True)
nse = he.evaluator(he.nse,Ob_data, sim_data)
nse1=pd.DataFrame(nse, columns=['NSE'], index=[str(s)])
R2=my_rsquare(Ob_data, sim_data)
R22=pd.DataFrame(R2, columns=['R2'], index=[str(s)])
MSE=mean_squared_error(Ob_data, sim_data)
MSE1=pd.DataFrame(MSE, columns=['MSE'], index=[str(s)])
RMSE=math.sqrt(MSE)
RMSE1=pd.DataFrame(RMSE, columns=['RMSE'], index=[str(s)])
corr = for_cor.corr()
corr1=pd.DataFrame(corr.iloc[0,1], columns=['Pearson_R'], index=[str(s)])
mae=mean_absolute_error(Ob_data, sim_data)
mae1=pd.DataFrame(mae, columns=['MAE'], index=[str(s)])
kge, r, alpha, beta = he.evaluator(he.kge, Ob_data, sim_data)
kge_results=pd.DataFrame([kge], columns=['kge'],index=[str(s)])
globals()['kge_' str(s)]=kge_results
Perf=pd.concat([nse1,R22,MSE1,corr1,kge_results,mae1,RMSE1],axis=1,copy=True)
globals()['perform_' str(s)]=Perf
for s in lat_lon['Stations']:
All_stations=pd.concat([globals()['perform_' str(s)] for s in lat_lon['Stations']],axis=0,copy=True)
globals()['result']=All_stations
final=pd.concat([result,lat_lon],axis=1,copy=True,sort=True)
final.drop('Stations',axis=1,inplace=True)
new_path = os.path.join("output/", file)
final.to_csv(new_path)
result['Stations']=result.index
result.index=result['Stations']
result.drop('Stations',axis=1,inplace=True)
and the Error log:
TypeError Traceback (most recent call last)
Input In [49], in <cell line: 1>()
11 sim_data=pd.DataFrame(simulated_data[str(s)])
12 for_cor=pd.concat([Ob_data,sim_data],axis=1,copy=True)
---> 13 nse = he.evaluator(he.nse,Ob_data, sim_data)
14 nse1=pd.DataFrame(nse, columns=['NSE'], index=[str(s)])
15 R2=my_rsquare(Ob_data, sim_data)
File D:\Program Files\Python\ANACONDA\lib\site-packages\hydroeval\hydroeval.py:158, in evaluator(obj_fn, simulations, evaluation, axis, transform, epsilon)
156 raise TypeError('simulations must be an array')
157 if not np.issubdtype(simulations.dtype, np.number):
--> 158 raise TypeError('simulations array must contain numerical values')
159 evaluation = np.asarray(evaluation)
160 if not evaluation.shape:
TypeError: simulations array must contain numerical values
and this is how I have defined nse function:
#NSE Function
import statistics
import pandas as pd
def my_nse(arr1,arr2):
numsum=densum=0
my_new=pd.DataFrame()
my_new['Observed_Discharge']=arr1
my_new['Simulated_Discharge']=arr2
mean_val_obs=statistics.mean(my_new['Observed_Discharge'])
i=0
while i<len(my_new['Simulated_Discharge'].values):
num=(my_new['Observed_Discharge'][i])-(my_new['Simulated_Discharge'][i])
num=num*num
den=(my_new['Observed_Discharge'][i])-mean_val_obs
den=den*den
numsum=numsum num
densum=densum den
i=i 1
cons=numsum/densum
nse=1-cons
return nse
Thank you
CodePudding user response:
In this cases you can try forcing the conversion of your DataFrame
to a float32
, since you want floating numbers.
I've created this script to do so:
import os
import pandas as pd
data = pd.read_csv("Observed_data.csv")
df = pd.DataFrame(data)
df = df(labels="Date", axis=1)
df = df.astype('float32')
Actually while doing so I've encountered this error:
ValueError: could not convert string to float: '#VALUE!'
I had a look at the data you provided and I've seen that you actually have a '#VALUE!'
string on line 204, column V429. It' a mistake that you have to fix manually. Once you handle it you should be good. Having a string, even by mistake, among float data, made the whole column non-numerical.