I am currently developing a project which uses the Twint python library for webscraping Twitter. The problem is, the way it saves the scraped data to is invalid in regards to JSON Formatting standards. All of the scraped tweets are saved as objects inside the JSON file, and when I try to parse them I get an error, since they aren't seperated with commas and aren't in an array.
{'key1': value1, 'key2': value2,}
{'key1': value1, 'key2': value2,}
{'key1': value1, 'key2': value2,}
as opposed to:
[
{'key1': value1, 'key2': value2,},
{'key1': value1, 'key2': value2,},
{'key1': value1, 'key2': value2,}
]
My question is, can i fix this by writing a script that would wrap the list of obejcts in an Array and seperates the objects with commas?
CodePudding user response:
Replace }
with },
? Or is that to simple as a solution?
CodePudding user response:
I think this choice of formatting is on purpose so you can stream the downloaded data into your app instead of loading and parsing it all at once (I assume it could get quite large if you are scraping twitter). Based on your node.js tag i assume you want to do the JSON parsing in the backend, for that there is a variety of packages you could just use, it can also be done with Rx/observables or you could implement it yourself by basically streaming the data until a linebreak \n
, then parse and continue streaming. For your own research start looking for JSON streaming
on npm, github, the web.
CodePudding user response:
This should give you an array of parsed javascript objects
const results = data.split("\n").map(line => JSON.parse(line))