I'm using the code below to scrape the latest daily prices for a number of funds:
import requests
import pandas as pd
urls = ['https://markets.ft.com/data/funds/tearsheet/historical?s=LU0526609390:EUR', 'https://markets.ft.com/data/funds/tearsheet/historical?s=IE00BHBX0Z19:EUR',
'https://markets.ft.com/data/funds/tearsheet/historical?s=LU1076093779:EUR']
def format_date(date):
date = date.split(',')[-2][1:] date.split(',')[-1]
return pd.Series({'Date': date})
for url in urls:
ISIN = url.split('=')[-1].replace(':', '_')
ISIN = ISIN[:-4]
ISIN = ISIN ".OTHER"
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
df['Date'] = df['Date'].apply(format_date)
del df['Open']
del df['High']
del df['Low']
del df['Volume']
df = df.rename(columns={'Close': 'last_traded_price'})
df = df.rename(columns={'Date': 'last_traded_on'})
df.insert(2, "id", ISIN)
df=df.head(1)
print (df)
df.to_csv(r'/Users/.../Testdata.csv', index=False)
At the moment, the Testdata.csv file is being overwritten everytime a new loop starts and I would like to find a way to save all of the data into the .csv file with this format:
Col 1 Col 2 Col 3
last_traded_on last_traded_price id
Oct 07 2021 78.83 LU0526609390.OTHER
Oct 07 2021 11.1 IE00BHBX0Z19.OTHER
Oct 07 2021 155.56 LU1076093779.OTHER
I need to find a way to somehow save the data to the .csv file outside of the loop but I'm really struggling to find a way to do it.
Thank you
CodePudding user response:
Use a file handler:
with open(r'/Users/.../Testdata.csv', 'w') as csvfile
# Here, you need to write headers:
# csvfile.write("header1,header2,header3\n")
for url in urls:
ISIN = url.split('=')[-1].replace(':', '_')
... # The rest of your code
df.to_csv(csvfile, index=False, header=False)
Or the best practice is to collect each dataframe in a list and use pd.concat
to merge all of them and save to a file:
dfs = []
for url in urls:
ISIN = url.split('=')[-1].replace(':', '_')
... # The rest of your code
dfs.append(df)
pd.concat(dfs).to_csv(r'/Users/.../Testdata.csv', index=False)
Note: your output looks like to be an output of df.to_string()
rather than df.to_csv