Home > OS >  How to save each iteration of a loop to a single .csv file using pandas
How to save each iteration of a loop to a single .csv file using pandas

Time:10-09

I'm using the code below to scrape the latest daily prices for a number of funds:

import requests
import pandas as pd

urls = ['https://markets.ft.com/data/funds/tearsheet/historical?s=LU0526609390:EUR', 'https://markets.ft.com/data/funds/tearsheet/historical?s=IE00BHBX0Z19:EUR', 
'https://markets.ft.com/data/funds/tearsheet/historical?s=LU1076093779:EUR']

def format_date(date):
    date = date.split(',')[-2][1:]   date.split(',')[-1]

    return pd.Series({'Date': date})

for url in urls:
    ISIN = url.split('=')[-1].replace(':', '_')
    ISIN = ISIN[:-4]
    ISIN = ISIN   ".OTHER"
    html = requests.get(url).content
    df_list = pd.read_html(html)
    df = df_list[-1]
    df['Date'] = df['Date'].apply(format_date)
    del df['Open']
    del df['High']
    del df['Low']
    del df['Volume']
    df = df.rename(columns={'Close': 'last_traded_price'})
    df = df.rename(columns={'Date': 'last_traded_on'})
    df.insert(2, "id", ISIN)
    df=df.head(1)
    print (df)
df.to_csv(r'/Users/.../Testdata.csv', index=False)

At the moment, the Testdata.csv file is being overwritten everytime a new loop starts and I would like to find a way to save all of the data into the .csv file with this format:

Col 1            Col 2                Col 3
last_traded_on   last_traded_price    id
Oct 07 2021      78.83                LU0526609390.OTHER
Oct 07 2021      11.1                 IE00BHBX0Z19.OTHER
Oct 07 2021      155.56               LU1076093779.OTHER

I need to find a way to somehow save the data to the .csv file outside of the loop but I'm really struggling to find a way to do it.

Thank you

CodePudding user response:

Use a file handler:

with open(r'/Users/.../Testdata.csv', 'w') as csvfile
    # Here, you need to write headers:
    # csvfile.write("header1,header2,header3\n")
    for url in urls:
        ISIN = url.split('=')[-1].replace(':', '_')
        ...  # The rest of your code
        df.to_csv(csvfile, index=False, header=False)

Or the best practice is to collect each dataframe in a list and use pd.concat to merge all of them and save to a file:

dfs = [] 
for url in urls:
    ISIN = url.split('=')[-1].replace(':', '_')
    ...  # The rest of your code
    dfs.append(df)

pd.concat(dfs).to_csv(r'/Users/.../Testdata.csv', index=False)

Note: your output looks like to be an output of df.to_string() rather than df.to_csv

  • Related