I have a Json file that I am converting and appending to a csv, and I am trying to also add a date column in the month/day/year format.
latest attempt:
import json
from csv import DictWriter
import pandas as pd
import datetime
with open('output.json', 'r') as inp, open('output.csv', 'a') as outp:
writer = DictWriter(outp, fieldnames=[
'name', 'price'])
for line in inp:
row = json.loads(line)
writer.writerow(row)
a = pd.read_csv('output.csv')
a['Date'] = [datetime.date.today()]*len(a)
a.to_csv('adate.csv')
I get close to what I am trying to do. output:
Lattice Top 6'W x 6'H King Cedar Aluminum/Composite Horizontal Fence-Starter Section-SURFACE MOUNT,"4,300 .",Date
"FENCESMART4U XCEL - Black Steel Anti-Rust Fence Gate - Sharp End Pickets - 4ft W x 5ft H - Easy Installation Kit, for Residential, Outdoor, Yard, Patio, Entry Way, 3-Rail Metal Gate",199 .,2022-10-18
"XCEL - Black Steel Anti-Rust Fence Gate - 4ft W x 5ft H - Easy Installation Kit, for Residential, Outdoor, Yard, Patio, Entry Way, 3-Rail Metal Gate",206 .,2022-10-18
"LZMZMQ Courtyard Baluster Garden Fencing Villa Outdoor Guardrail, Wide 100 120 140 160 180 cm Aluminum Fence Panels Deckorators, Grey Panels Decor Free Standing for Weatherproof","1,771 .",2022-10-18
the date gets inserted on the first row. I have tried adding column headers such as
headerList = ['name', 'price', 'date']
and that didn't go over well under the first column it removed the product names.
this is how the json file is reading in:
{"name": "Lattice Top 6'W x 6'H King Cedar Aluminum/Composite Horizontal Fence-Starter Section-SURFACE MOUNT", "price": "4,300 ."}
{"name": "FENCESMART4U XCEL - Black Steel Anti-Rust Fence Gate - Sharp End Pickets - 4ft W x 5ft H - Easy Installation Kit, for Residential, Outdoor, Yard, Patio, Entry Way, 3-Rail Metal Gate", "price": "199 ."}
{"name": "XCEL - Black Steel Anti-Rust Fence Gate - 4ft W x 5ft H - Easy Installation Kit, for Residential, Outdoor, Yard, Patio, Entry Way, 3-Rail Metal Gate", "price": "206 ."}
{"name": "LZMZMQ Courtyard Baluster Garden Fencing Villa Outdoor Guardrail, Wide 100 120 140 160 180 cm Aluminum Fence Panels Deckorators, Grey Panels Decor Free Standing for Weatherproof", "price": "1,771 ."}
CodePudding user response:
You can do this all in pandas
directly:
import pandas as pd
df = pd.read_json('output.json', orient='records', lines=True)
df['Date'] = pd.to_datetime('today').strftime('%m/%d/%Y')
df.to_csv('adate.csv', index=False)
Output (for your sample data):
name,price,Date
Lattice Top 6'W x 6'H King Cedar Aluminum/Composite Horizontal Fence-Starter Section-SURFACE MOUNT,"4,300 .",10/22/2022
"FENCESMART4U XCEL - Black Steel Anti-Rust Fence Gate - Sharp End Pickets - 4ft W x 5ft H - Easy Installation Kit, for Residential, Outdoor, Yard, Patio, Entry Way, 3-Rail Metal Gate",199 .,10/22/2022
"XCEL - Black Steel Anti-Rust Fence Gate - 4ft W x 5ft H - Easy Installation Kit, for Residential, Outdoor, Yard, Patio, Entry Way, 3-Rail Metal Gate",206 .,10/22/2022
"LZMZMQ Courtyard Baluster Garden Fencing Villa Outdoor Guardrail, Wide 100 120 140 160 180 cm Aluminum Fence Panels Deckorators, Grey Panels Decor Free Standing for Weatherproof","1,771 .",10/22/2022
CodePudding user response:
You can do this directly without pandas
if desired. Make sure to use newline=''
on files used with the csv
module. (why)
import json
from csv import DictWriter
from datetime import datetime
with open('output.json', 'r', newline='') as inp, open('output.csv', 'w', newline='') as outp:
writer = DictWriter(outp, fieldnames=['name', 'price', 'date'])
writer.writeheader()
for line in inp:
row = json.loads(line)
row['date'] = datetime.now().strftime('%m/%d/%Y')
writer.writerow(row)
Output:
name,price,date
Lattice Top 6'W x 6'H King Cedar Aluminum/Composite Horizontal Fence-Starter Section-SURFACE MOUNT,4300.00,10/21/2022
"FENCESMART4U XCEL - Black Steel Anti-Rust Fence Gate - Sharp End Pickets - 4ft W x 5ft H - Easy Installation Kit, for Residential, Outdoor, Yard, Patio, Entry Way, 3-Rail Metal Gate",199.00,10/21/2022
"XCEL - Black Steel Anti-Rust Fence Gate - 4ft W x 5ft H - Easy Installation Kit, for Residential, Outdoor, Yard, Patio, Entry Way, 3-Rail Metal Gate",206.00,10/21/2022
"LZMZMQ Courtyard Baluster Garden Fencing Villa Outdoor Guardrail, Wide 100 120 140 160 180 cm Aluminum Fence Panels Deckorators, Grey Panels Decor Free Standing for Weatherproof",1771.00,10/21/2022
CodePudding user response:
This solution will make use of the parameter restval
, which means if a value is missing, the DictWriter
can use this value to fill in.
import csv
import json
import datetime
today = datetime.date.today()
formatted_date = today.strftime("%Y/%m/%d")
with open("output.json") as instream, open("output.csv", "w") as outstream:
writer = csv.DictWriter(
outstream,
fieldnames=["name", "price", "date"],
restval=formatted_date
)
writer.writeheader()
for line in instream:
row = json.loads(line)
writer.writerow(row)
So what I have here is almost the same as the original poster's code, but with restval
.