Home > Mobile >  Python Iterate through rows, run and save
Python Iterate through rows, run and save

Time:02-13

I have a pandas dataframe where I want to loop over its rows, run and save output, if any error then ingore it and move to next row.

import pandas as pd
from nsepy import get_history #can be install by "pip install nsepy"
from datetime import date

data = {'script': ['SBIN = get_history(symbol="SBIN", start=date(1985,1,1), end=date(2022,1,31))',
'SAIL = get_history(symbol="SAIL", start=date(1985,1,1), end=date(2022,1,31))', 
'20MICRONS = get_history(symbol="20MICRONS", start=date(1985,1,1), end=date(2022,1,31))',
'RELIANCE = get_history(symbol="RELIANCE", start=date(1985,1,1), end=date(2022,1,31))']}  

df = pd.DataFrame(data)  

Now I want to run each line one by one I can do it by

#run each row
#1
SBIN = get_history(symbol="SBIN", start=date(1985,1,1), end=date(2022,1,31))
df1.to_csv('SBIN', sep="\t")
#2
SAIL = get_history(symbol="SAIL", start=date(1985,1,1), end=date(2022,1,31))'
df1.to_csv('SAIL', sep="\t")
#3
20MICRONS = get_history(symbol="20MICRONS", start=date(1985,1,1), end=date(2022,1,31))
df1.to_csv('20MICRONS', sep="\t")
#4
RELIANCE = get_history(symbol="RELIANCE", start=date(1985,1,1), end=date(2022,1,31))
df1.to_csv('RELIANCE', sep="\t")

But it is going to take huge time. so how can it be done by for loop or while loop

Please note I would like to run each row and save the output as a character extracted before = sign of same row for example "SBIN" for first row. In case if there is any error on any line then ignore the error and move to the next line (line 3 is going to return an error which is due to the unavailability of data)

CodePudding user response:

As your process is IO-Bounded, you can use Threading to increase the speed. You can try this:

import pandas as pd
from nsepy import get_history
from datetime import date
import concurrent.futures

history = {
    "SBIN": {"start": date(2021, 1, 1), "end": date(2022, 1, 31)},
    "SAIL": {"start": date(2021, 1, 1), "end": date(2022, 1, 31)},
    "20MICRONS": {"start": date(2021, 1, 1), "end": date(2022, 1, 31)},
    "RELIANCE": {"start": date(2021, 1, 1), "end": date(2022, 1, 31)},
}


def get_historical_data(symbol, /, **kwds):
    print(symbol)
    df = get_history(symbol, **kwds)
    df.to_csv(f'{symbol}.csv', sep='\t')
    return df


data = []
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
    future_history = [
        executor.submit(get_historical_data, symbol, **data)
        for symbol, data in history.items()
    ]

    data = []
    for future in concurrent.futures.as_completed(future_history):
        data.append(future.result())
    df = pd.concat(data)
  • Related