Home > Enterprise >  How to Loop the following CODE in Jupyter Notebook instead of repeating same steps for each input fi
How to Loop the following CODE in Jupyter Notebook instead of repeating same steps for each input fi

Time:08-30

I am trying to loop the following script. I have over 100 csv input files. So looping will help me save my time alot. The Script requires two input files aka Simulated file and observed file and a Station file to get the required output. Please note that the Observed and Station files are one single constant file while Simulated ones are in 100s. So for every simulated file i put in the script i can get my required output. So i wanna automate the whole process. All Files are in .csv extension and need output in csv extension as well. I want to name each output file i get from the loop same as my input file.I have already created Input and Output Separate Directpory and within Input Directory I have created Simulated Directory so its easy to access all simulated.csv files for looping. PS: I am noob at programming :( and Thank you for helping in advanceeeeeeee... ##Not attaching libraries i used and how i defined those statistical indices , as the code is already lengthy. Just looping this big ass code will be enough.

#INPUT_DATA
Observed_data=pd.read_csv('input/Observed_data.csv')
Observed_data['Date']=pd.to_datetime(Observed_data['Date'])
Observed_data.index=Observed_data['Date']
Observed_data.drop(['Date'],axis=1,inplace=True)
Observed_data.index

simulated_data=pd.read_csv('input/sim/Simulated_data.csv')
simulated_data['Date']=pd.to_datetime(simulated_data['Date'])
simulated_data.index=simulated_data['Date']
simulated_data.drop(['Date'],axis=1,inplace=True)
simulated_data.index

lat_lon=pd.read_csv('input/Stations_latlon.csv')
lat_lon.index=lat_lon['Stations']
#lat_lon.drop('Stations',axis=1,inplace=True)
lat_lon 

#Model_Performance_Evaluation
import pandas as pddef clean_dataset(df):    assert isinstance(df, pd.DataFrame), "df needs to be a pd.DataFrame"    df.dropna(inplace=True)    indices_to_keep = ~df.isin([np.nan, np.inf, -np.inf]).any(1)    return df[indices_to_keep].astype(np.float64)

for s in lat_lon['Stations']:
    Ob_data=pd.DataFrame(Observed_data[str(s)])
    sim_data=pd.DataFrame(simulated_data[str(s)])
    for_cor=pd.concat([Ob_data,sim_data],axis=1,copy=True)
    nse = he.evaluator(he.nse,Ob_data, sim_data)
    nse1=pd.DataFrame(nse, columns=['NSE'], index=[str(s)])
    R2=my_rsquare(Ob_data, sim_data)
    R22=pd.DataFrame(R2, columns=['R2'], index=[str(s)])
    MSE=mean_squared_error(Ob_data, sim_data)
    MSE1=pd.DataFrame(MSE, columns=['MSE'], index=[str(s)])
    RMSE=math.sqrt(MSE)
    RMSE1=pd.DataFrame(RMSE, columns=['RMSE'], index=[str(s)])
    corr = for_cor.corr()
    corr1=pd.DataFrame(corr.iloc[0,1], columns=['Pearson_R'], index=[str(s)])
    mae=mean_absolute_error(Ob_data, sim_data)
    mae1=pd.DataFrame(mae, columns=['MAE'], index=[str(s)])
    kge, r, alpha, beta = he.evaluator(he.kge, Ob_data, sim_data)
    kge_results=pd.DataFrame([kge], columns=['kge'],index=[str(s)])
    globals()['kge_' str(s)]=kge_results
    Perf=pd.concat([nse1,R22,MSE1,corr1,kge_results,mae1,RMSE1],axis=1,copy=True)
    globals()['perform_' str(s)]=Perf
       
for s in lat_lon['Stations']:   
    All_stations=pd.concat([globals()['perform_' str(s)] for s in lat_lon['Stations']],axis=0,copy=True)
    globals()['result']=All_stations

final=pd.concat([result,lat_lon],axis=1,copy=True,sort=True)
final.drop('Stations',axis=1,inplace=True)
   
ff='Output/Model_performance.csv'
final.to_csv(ff)

result['Stations']=result.index
result.index=result['Stations']
result.drop('Stations',axis=1,inplace=True)

CodePudding user response:

You can loop over all files in a directory input as I show you below. You can also check that all files are ending with .csv. You can save the new file using the path new_path:

import os
for file in os.listdir("/input/"):
    if file.endswith(".csv"):
        new_path = os.path.join("/output", file)
  • Related