I am trying to scrape some json data. The first few rows ae as follows and all the latter is in the same format. Json data:
{
"data": [
{
"date": "2011-10-07",
"f(avg(output_total)/number(100000000))": 50
},
{
"date": "2011-10-08",
"f(avg(output_total)/number(100000000))": 50
},
{
"date": "2011-10-12",
"f(avg(output_total)/number(100000000))": 50
},
{
"date": "2011-10-13",
"f(avg(output_total)/number(100000000))": 54.0515120216902
},.......]
I am willing scrape the date with the its relevant value (like fi=or the above, 2011-10-07 and 50, 2011-10-08 and 50 etc.) into a csv file which contains two columns (date and value)
How can I proceed this? is it possible with python?
This is how I grabbed the json data:
import os
import requests
url='https://api.blockchair.com/litecoin/transactions?a=date,f(avg(output_total)/number(100000000))'
proxies = {}
response = requests.get(url=url, proxies=proxies)
print(response.content)
CodePudding user response:
json = {
"data": [
{
"date": "2011-10-07",
"f(avg(output_total)/number(100000000))": 50
},
{
"date": "2011-10-08",
"f(avg(output_total)/number(100000000))": 50
},
{
"date": "2011-10-12",
"f(avg(output_total)/number(100000000))": 50
},
{
"date": "2011-10-13",
"f(avg(output_total)/number(100000000))": 54.0515120216902
}]}
Step 1: Convert json into a Pandas Dataframe
df = pd.DataFrame(json['data'])
Step 2: Filter Df based on conditions ( e.g >>> value = 50)
df_filtered = df[(df["f(avg(output_total)/number(100000000))"] == 50)]
Step 3: Save df into csv file and choose the location where you like to store the CSV file on your computer.
df_filtered.to_csv(r'C:\user\foo\output.csv', index = False)
if you wish to include the index, then simply remove index = False
CodePudding user response:
pandas allows you to solve this one in a few lines:
import pandas as pd
df = pd.DataFrame(json_data['data'])
df.columns = ["date", "value"]
df.to_csv("data.csv", index=False)
CodePudding user response:
You can do like this.
Iterate over the JSON string, extract the data you need and then write that data to CSV file.
import json
import csv
fields = ['Date', 'Value']
filename = 'test.csv'
s = """
{
"data":[
{
"date":"2011-10-07",
"f(avg(output_total)/number(100000000))":50
},
{
"date":"2011-10-08",
"f(avg(output_total)/number(100000000))":50
},
{
"date":"2011-10-12",
"f(avg(output_total)/number(100000000))":50
},
{
"date":"2011-10-13",
"f(avg(output_total)/number(100000000))":54.0515120216902
}
]
}
"""
x = json.loads(s)
with open(filename, 'w', newline='') as f:
cw = csv.writer(f)
cw.writerow(fields)
for i in x['data']:
cw.writerow(i.values())
test.csv
Date Value
07-10-11 50
08-10-11 50
12-10-11 50
13-10-11 54.05151202
CodePudding user response:
If you just want a CSV file without relying on any additional Python modules (such as pandas) then it's very simple:
import requests
CSV = 'blockchair.csv'
url='https://api.blockchair.com/litecoin/transactions?a=date,f(avg(output_total)/number(100000000))'
with requests.Session() as session:
response = session.get(url)
response.raise_for_status()
with open(CSV, 'w') as csv:
csv.write('Date,Value\n')
for d in response.json()['data']:
for i, v in enumerate(d.values()):
if i > 0:
csv.write(',')
csv.write(str(v))
csv.write('\n')
CodePudding user response:
You can try this:
import requests
import csv
import pandas as pd
url='https://api.blockchair.com/litecoin/transactions?a=date,f(avg(output_total)/number(100000000))'
csv_name = 'res_values_1.csv'
response = requests.get(url=url).json()
res_data = response.get('data', [])
# Solution using pandas
res_df = pd.DataFrame(res_data)
res_df.rename(columns={'f(avg(output_total)/number(100000000))': 'value'}, inplace=True)
# filter data those value in >= 50
filtered_res_df = res_df[(res_df["value"] >= 50)]
filtered_res_df.to_csv(csv_name, sep=',', encoding='utf-8', index = False)
# Solution using csv
csv_name = 'res_values_2.csv'
headers = ['date', 'value']
with open(csv_name, 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(headers)
for data in res_data:
values = list(data.values())
if values[1] >= 50:
writer.writerow(values)
CSV Output:
date,value
2011-10-07,50.0
2011-10-08,50.0
2011-10-12,50.0
2011-10-13,54.0515120216902
.
.
.
2021-10-05,346.12752821011594
2021-10-06,293.5061907016782
2021-10-07,333.17665010641673
2021-10-08,332.2437737707938