Scrape specific Json data to a csv-CodePudding

I am trying to scrape some json data. The first few rows ae as follows and all the latter is in the same format. Json data:

{
  "data": [
    {
      "date": "2011-10-07",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-08",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-12",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-13",
      "f(avg(output_total)/number(100000000))": 54.0515120216902
    },.......]

I am willing scrape the date with the its relevant value (like fi=or the above, 2011-10-07 and 50, 2011-10-08 and 50 etc.) into a csv file which contains two columns (date and value)

How can I proceed this? is it possible with python?

This is how I grabbed the json data:

import os
import requests

url='https://api.blockchair.com/litecoin/transactions?a=date,f(avg(output_total)/number(100000000))'

proxies = {}
response = requests.get(url=url, proxies=proxies)
print(response.content)

CodePudding user response：

json = {
  "data": [
    {
      "date": "2011-10-07",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-08",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-12",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-13",
      "f(avg(output_total)/number(100000000))": 54.0515120216902
    }]}

Step 1: Convert json into a Pandas Dataframe

df = pd.DataFrame(json['data'])

Step 2: Filter Df based on conditions ( e.g >>> value = 50)

df_filtered = df[(df["f(avg(output_total)/number(100000000))"] == 50)]

Step 3: Save df into csv file and choose the location where you like to store the CSV file on your computer.

df_filtered.to_csv(r'C:\user\foo\output.csv', index = False)

if you wish to include the index, then simply remove index = False

CodePudding user response：

pandas allows you to solve this one in a few lines:

import pandas as pd
df = pd.DataFrame(json_data['data'])
df.columns = ["date", "value"]
df.to_csv("data.csv", index=False)

CodePudding user response：

You can do like this.

Iterate over the JSON string, extract the data you need and then write that data to CSV file.

import json
import csv
fields = ['Date', 'Value']
filename = 'test.csv'
s = """
{
   "data":[
      {
         "date":"2011-10-07",
         "f(avg(output_total)/number(100000000))":50
      },
      {
         "date":"2011-10-08",
         "f(avg(output_total)/number(100000000))":50
      },
      {
         "date":"2011-10-12",
         "f(avg(output_total)/number(100000000))":50
      },
      {
         "date":"2011-10-13",
         "f(avg(output_total)/number(100000000))":54.0515120216902
      }
   ]
}
"""
x = json.loads(s)
with open(filename, 'w', newline='') as f:
    cw = csv.writer(f)
    cw.writerow(fields)

    for i in x['data']:
        cw.writerow(i.values())

test.csv

Date        Value
07-10-11    50
08-10-11    50
12-10-11    50
13-10-11    54.05151202

CodePudding user response：

If you just want a CSV file without relying on any additional Python modules (such as pandas) then it's very simple:

import requests
CSV = 'blockchair.csv'
url='https://api.blockchair.com/litecoin/transactions?a=date,f(avg(output_total)/number(100000000))'
with requests.Session() as session:
    response = session.get(url)
    response.raise_for_status()
    with open(CSV, 'w') as csv:
        csv.write('Date,Value\n')
        for d in response.json()['data']:
            for i, v in enumerate(d.values()):
                if i > 0:
                    csv.write(',')
                csv.write(str(v))
            csv.write('\n')

CodePudding user response：

You can try this:

import requests
import csv
import pandas as pd

url='https://api.blockchair.com/litecoin/transactions?a=date,f(avg(output_total)/number(100000000))'
csv_name = 'res_values_1.csv'

response = requests.get(url=url).json()
res_data = response.get('data', [])

# Solution using pandas
res_df = pd.DataFrame(res_data)
res_df.rename(columns={'f(avg(output_total)/number(100000000))': 'value'}, inplace=True)

# filter data those value in >= 50
filtered_res_df = res_df[(res_df["value"] >= 50)]
filtered_res_df.to_csv(csv_name, sep=',', encoding='utf-8', index = False)

# Solution using csv
csv_name = 'res_values_2.csv'
headers = ['date', 'value']
with open(csv_name, 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(headers)

    for data in res_data:
        values = list(data.values())
        if values[1] >= 50:
            writer.writerow(values)

CSV Output:

date,value
2011-10-07,50.0
2011-10-08,50.0
2011-10-12,50.0
2011-10-13,54.0515120216902
.
.
.
2021-10-05,346.12752821011594
2021-10-06,293.5061907016782
2021-10-07,333.17665010641673
2021-10-08,332.2437737707938