Home > Blockchain >  Scrape specific Json data to a csv
Scrape specific Json data to a csv

Time:10-08

I am trying to scrape some json data. The first few rows ae as follows and all the latter is in the same format. Json data:

{
  "data": [
    {
      "date": "2011-10-07",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-08",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-12",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-13",
      "f(avg(output_total)/number(100000000))": 54.0515120216902
    },.......]

I am willing scrape the date with the its relevant value (like fi=or the above, 2011-10-07 and 50, 2011-10-08 and 50 etc.) into a csv file which contains two columns (date and value)

How can I proceed this? is it possible with python?

This is how I grabbed the json data:

import os
import requests

url='https://api.blockchair.com/litecoin/transactions?a=date,f(avg(output_total)/number(100000000))'

proxies = {}
response = requests.get(url=url, proxies=proxies)
print(response.content)

CodePudding user response:

json = {
  "data": [
    {
      "date": "2011-10-07",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-08",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-12",
      "f(avg(output_total)/number(100000000))": 50
    },
    {
      "date": "2011-10-13",
      "f(avg(output_total)/number(100000000))": 54.0515120216902
    }]}

Step 1: Convert json into a Pandas Dataframe

df = pd.DataFrame(json['data'])

Step 2: Filter Df based on conditions ( e.g >>> value = 50)

df_filtered = df[(df["f(avg(output_total)/number(100000000))"] == 50)]

Step 3: Save df into csv file and choose the location where you like to store the CSV file on your computer.

df_filtered.to_csv(r'C:\user\foo\output.csv', index = False)

if you wish to include the index, then simply remove index = False

CodePudding user response:

pandas allows you to solve this one in a few lines:

import pandas as pd
df = pd.DataFrame(json_data['data'])
df.columns = ["date", "value"]
df.to_csv("data.csv", index=False)

CodePudding user response:

You can do like this.

Iterate over the JSON string, extract the data you need and then write that data to CSV file.

import json
import csv
fields = ['Date', 'Value']
filename = 'test.csv'
s = """
{
   "data":[
      {
         "date":"2011-10-07",
         "f(avg(output_total)/number(100000000))":50
      },
      {
         "date":"2011-10-08",
         "f(avg(output_total)/number(100000000))":50
      },
      {
         "date":"2011-10-12",
         "f(avg(output_total)/number(100000000))":50
      },
      {
         "date":"2011-10-13",
         "f(avg(output_total)/number(100000000))":54.0515120216902
      }
   ]
}
"""
x = json.loads(s)
with open(filename, 'w', newline='') as f:
    cw = csv.writer(f)
    cw.writerow(fields)

    for i in x['data']:
        cw.writerow(i.values())

test.csv

Date        Value
07-10-11    50
08-10-11    50
12-10-11    50
13-10-11    54.05151202

CodePudding user response:

If you just want a CSV file without relying on any additional Python modules (such as pandas) then it's very simple:

import requests
CSV = 'blockchair.csv'
url='https://api.blockchair.com/litecoin/transactions?a=date,f(avg(output_total)/number(100000000))'
with requests.Session() as session:
    response = session.get(url)
    response.raise_for_status()
    with open(CSV, 'w') as csv:
        csv.write('Date,Value\n')
        for d in response.json()['data']:
            for i, v in enumerate(d.values()):
                if i > 0:
                    csv.write(',')
                csv.write(str(v))
            csv.write('\n')

CodePudding user response:

You can try this:

import requests
import csv
import pandas as pd

url='https://api.blockchair.com/litecoin/transactions?a=date,f(avg(output_total)/number(100000000))'
csv_name = 'res_values_1.csv'

response = requests.get(url=url).json()
res_data = response.get('data', [])

# Solution using pandas
res_df = pd.DataFrame(res_data)
res_df.rename(columns={'f(avg(output_total)/number(100000000))': 'value'}, inplace=True)

# filter data those value in >= 50
filtered_res_df = res_df[(res_df["value"] >= 50)]
filtered_res_df.to_csv(csv_name, sep=',', encoding='utf-8', index = False)

# Solution using csv
csv_name = 'res_values_2.csv'
headers = ['date', 'value']
with open(csv_name, 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(headers)

    for data in res_data:
        values = list(data.values())
        if values[1] >= 50:
            writer.writerow(values)

CSV Output:

date,value
2011-10-07,50.0
2011-10-08,50.0
2011-10-12,50.0
2011-10-13,54.0515120216902
.
.
.
2021-10-05,346.12752821011594
2021-10-06,293.5061907016782
2021-10-07,333.17665010641673
2021-10-08,332.2437737707938
  • Related