At the moment running this code will make just a single .csv file with only the last result included. How can I export all the fetched data to one .csv file?
import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
for id in range (1, 6):
url = f"https://liiga.fi/api/v1/shotmap/2022/{id}"
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
s = s.replace('null','"placeholder"')
data = json.loads(s)
data = json_normalize(data)
matsit = pd.DataFrame(data)
print (matsit)
matsit.to_csv("matsit", index=False)
CodePudding user response:
At the moment you're only saving the last iteration of your loop. The key is to define a data structure outside of the loop and add to it with each iteration. For example, you could define a dataframe and add to it using pd.concat
as such:
df = pd.DataFrame()
for id in range (1, 6):
url = f"https://liiga.fi/api/v1/shotmap/2022/{id}"
res = requests.get(url)
soup = BeautifulSoup(res.content, "lxml")
s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
s = s.replace('null','"placeholder"')
data = json.loads(s)
data = json_normalize(data)
matsit = pd.DataFrame(data)
df = pd.concat([df, matsit], axis=1)
print(matsit)
df.to_csv("matsit.csv", index=False)
CodePudding user response:
Simply collect your DataFrames
in a list
and be aware that you do not need BeautifulSoup
while you can grab the JSON directly from response:
data.append(pd.json_normalize(requests.get(url).json()))
and concat
them to a single one:
pd.concat(data, ignore_index=True).to_csv("matsit", index=False)
Note: You also should use pd.json_normalize(json.loads(s))
instead of to avoid FutureWarning: pandas.io.json.json_normalize is deprecated,...
- Also avoid to use reserved keywords (id)
Example
import requests
import pandas as pd
data = []
for i in range (1, 6):
url = f"https://liiga.fi/api/v1/shotmap/2022/{i}"
data.append(pd.json_normalize(requests.get(url).json()))
pd.concat(data, ignore_index=True).to_csv("matsit", index=False)