Home > Software design >  How to Solve "simulations array must contain numerical values" error when my csv files are
How to Solve "simulations array must contain numerical values" error when my csv files are

Time:09-04

I am trying to evaluate dataset of temperature that i extracted from GCM against my observed data. I used the same exact script for precipitation files as well and it worked well. But now when I run that same script for Temp it gives me error. The format in which i prepared my input files are exactly same as precipitation files. So it should work... The error is as following:
"TypeError: simulations array must contain numerical values"
All my input files aka Simulated file , Station file and output file which I am supposed to get are in .CSV format. I am sharing the script below. Please have a look and help me out. and my simulated and observed files are here:https://drive.google.com/drive/folders/1u5kgCSVbReDzv1bgh1l_YJjmv1iwatlh?usp=sharing

Edit : The script is quite long so i am sharing the concerned parts. Plz lemme know if its not clear enough.

for file in os.listdir("input/sim"):
    if file.endswith(".csv"):
        simulated_data=pd.read_csv(os.path.join("input/sim", file))
    simulated_data['Date']=pd.to_datetime(simulated_data['Date'])
    simulated_data.index=simulated_data['Date']
    simulated_data.drop(['Date'],axis=1,inplace=True)
    simulated_data.index
       
    for s in lat_lon['Stations']:
        Ob_data=pd.DataFrame(Observed_data[str(s)])
        sim_data=pd.DataFrame(simulated_data[str(s)])
        for_cor=pd.concat([Ob_data,sim_data],axis=1,copy=True)
        nse = he.evaluator(he.nse,Ob_data, sim_data)
        nse1=pd.DataFrame(nse, columns=['NSE'], index=[str(s)])
        R2=my_rsquare(Ob_data, sim_data)
        R22=pd.DataFrame(R2, columns=['R2'], index=[str(s)])
        MSE=mean_squared_error(Ob_data, sim_data)
        MSE1=pd.DataFrame(MSE, columns=['MSE'], index=[str(s)])
        RMSE=math.sqrt(MSE)
        RMSE1=pd.DataFrame(RMSE, columns=['RMSE'], index=[str(s)])
        corr = for_cor.corr()
        corr1=pd.DataFrame(corr.iloc[0,1], columns=['Pearson_R'], index=[str(s)])
        mae=mean_absolute_error(Ob_data, sim_data)
        mae1=pd.DataFrame(mae, columns=['MAE'], index=[str(s)])
        kge, r, alpha, beta = he.evaluator(he.kge, Ob_data, sim_data)
        kge_results=pd.DataFrame([kge], columns=['kge'],index=[str(s)])
        globals()['kge_' str(s)]=kge_results
        Perf=pd.concat([nse1,R22,MSE1,corr1,kge_results,mae1,RMSE1],axis=1,copy=True)
        globals()['perform_' str(s)]=Perf
    
    for s in lat_lon['Stations']:   
        All_stations=pd.concat([globals()['perform_' str(s)] for s in lat_lon['Stations']],axis=0,copy=True)
        globals()['result']=All_stations
    
        final=pd.concat([result,lat_lon],axis=1,copy=True,sort=True)
        final.drop('Stations',axis=1,inplace=True)
    
        new_path = os.path.join("output/", file)
        final.to_csv(new_path)

        result['Stations']=result.index
        result.index=result['Stations']
        result.drop('Stations',axis=1,inplace=True)    

and the Error log:

TypeError                                 Traceback (most recent call last)
Input In [49], in <cell line: 1>()
     11 sim_data=pd.DataFrame(simulated_data[str(s)])
     12 for_cor=pd.concat([Ob_data,sim_data],axis=1,copy=True)
---> 13 nse = he.evaluator(he.nse,Ob_data, sim_data)
     14 nse1=pd.DataFrame(nse, columns=['NSE'], index=[str(s)])
     15 R2=my_rsquare(Ob_data, sim_data)

File D:\Program Files\Python\ANACONDA\lib\site-packages\hydroeval\hydroeval.py:158, in evaluator(obj_fn, simulations, evaluation, axis, transform, epsilon)
    156     raise TypeError('simulations must be an array')
    157 if not np.issubdtype(simulations.dtype, np.number):
--> 158     raise TypeError('simulations array must contain numerical values')
    159 evaluation = np.asarray(evaluation)
    160 if not evaluation.shape:

TypeError: simulations array must contain numerical values  

and this is how I have defined nse function:

#NSE Function

import statistics
import pandas as pd

def my_nse(arr1,arr2):
    
    numsum=densum=0
    
    my_new=pd.DataFrame()
    my_new['Observed_Discharge']=arr1
    my_new['Simulated_Discharge']=arr2
    
    mean_val_obs=statistics.mean(my_new['Observed_Discharge'])

    i=0
    while i<len(my_new['Simulated_Discharge'].values):
        
        num=(my_new['Observed_Discharge'][i])-(my_new['Simulated_Discharge'][i])
        num=num*num

        den=(my_new['Observed_Discharge'][i])-mean_val_obs
        den=den*den

        numsum=numsum num
        densum=densum den

        i=i 1

    cons=numsum/densum
    nse=1-cons
    
    return nse  

Thank you

CodePudding user response:

In this cases you can try forcing the conversion of your DataFrame to a float32, since you want floating numbers.

I've created this script to do so:

import os 
import pandas as pd

data = pd.read_csv("Observed_data.csv")
df = pd.DataFrame(data)
df = df(labels="Date", axis=1)
df = df.astype('float32')

Actually while doing so I've encountered this error:

ValueError: could not convert string to float: '#VALUE!'

I had a look at the data you provided and I've seen that you actually have a '#VALUE!' string on line 204, column V429. It' a mistake that you have to fix manually. Once you handle it you should be good. Having a string, even by mistake, among float data, made the whole column non-numerical.

  • Related