Home > OS >  How to export all of the gathered data to .CSV?
How to export all of the gathered data to .CSV?

Time:09-03

At the moment running this code will make just a single .csv file with only the last result included. How can I export all the fetched data to one .csv file?

import requests
import pandas as pd
import json
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup
    
for id in range (1, 6):
      url = f"https://liiga.fi/api/v1/shotmap/2022/{id}"
      res = requests.get(url)
      soup = BeautifulSoup(res.content, "lxml")
      s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
      s = s.replace('null','"placeholder"')
      data = json.loads(s)
      data = json_normalize(data)
      matsit = pd.DataFrame(data)
      print (matsit)

matsit.to_csv("matsit", index=False)

CodePudding user response:

At the moment you're only saving the last iteration of your loop. The key is to define a data structure outside of the loop and add to it with each iteration. For example, you could define a dataframe and add to it using pd.concat as such:

df = pd.DataFrame()

for id in range (1, 6):
    url = f"https://liiga.fi/api/v1/shotmap/2022/{id}"
    res = requests.get(url)
    soup = BeautifulSoup(res.content, "lxml")
    s = soup.select('html')[0].text.strip('jQuery1720724027235122559_1542743885014(').strip(')')
    s = s.replace('null','"placeholder"')
    data = json.loads(s)
    data = json_normalize(data)
    matsit = pd.DataFrame(data)
    df = pd.concat([df, matsit], axis=1)
    print(matsit)

df.to_csv("matsit.csv", index=False)

CodePudding user response:

Simply collect your DataFrames in a list and be aware that you do not need BeautifulSoup while you can grab the JSON directly from response:

data.append(pd.json_normalize(requests.get(url).json()))

and concat them to a single one:

pd.concat(data, ignore_index=True).to_csv("matsit", index=False)

Note: You also should use pd.json_normalize(json.loads(s)) instead of to avoid FutureWarning: pandas.io.json.json_normalize is deprecated,... - Also avoid to use reserved keywords (id)

Example

import requests
import pandas as pd

data = []
for i in range (1, 6):
    url = f"https://liiga.fi/api/v1/shotmap/2022/{i}"
    data.append(pd.json_normalize(requests.get(url).json()))

pd.concat(data, ignore_index=True).to_csv("matsit", index=False)
  • Related