Home > Software design >  Scrapy: Converting Javascript array to Json on Python
Scrapy: Converting Javascript array to Json on Python

Time:06-18

I have been struggling with a site I am scrapping using scrappy. This site, returns a series of Javascript variables (array) with the products data. Example:

datos[0] = ["12345","3M YELLOW CAT5E CABLE","6.81","1","A","N","N","N","N","N",0,0,0,0,0,"0","0","0","0","0","P","001-0030","12","40K8957","28396","250","Due: 30-12-1899",0.0000,1,"",\'\'];
datos[1] = ["12346","3M GREEN CAT5E CABLE","7.81","1","A","N","N","N","N","N",0,0,0,0,0,"0","0","0","0","0","P","001-0030","12","40K8957","28396","250","Due: 30-12-1899",0.0000,1,"",\'\'];
...

So on...

Fetching the array into a string with scrapy was easy, since the site response prints the variables. The problem is I want to transform it into Json so I can process it and store it in a database table.

Normally I would use Javascript's function Json.stringify to convert it to Json and post it in PHP.

However when using Python's json.loads and even StringIO I am unable to load the array into json.

Probably is a format error, but I am unable to identify it, since I am not expert in Json nor Python.

EDIT: I just realize since scrapy is unable to execute Javascript probably the main issue is that the data is just a string. I should format it into a Json format.

Any help is more than welcome.

Thank you.

CodePudding user response:

If you wanted to take an array and create a json object, you could do something like this.

values = ["12345","3M YELLOW CAT5E CABLE","6.81","1","A","N","N","N","N","N",0,0,0,0,0,"0","0","0","0","0","P","001-0030","12","40K8957","28396","250","Due: 30-12-1899",0.0000,1]
keys = [x for x in range(len(values))]
d = dict(zip(keys, values))
x = json.dumps(d)

CodePudding user response:

There is a section in the scrapy doc to find various ways to parse the JavaScript code. For your case, if you just need to have it in an array, you can use the regex to get the data.

Since the website you are scraping is not present in the question, I am assuming this would be a more straightforward way to get it, but you could use whichever way seems suitable.

  • Related