Home > OS >  Is there a way I can extract a list from a javascript document?
Is there a way I can extract a list from a javascript document?

Time:12-10

There is a website where I need to obtain the owners of this item from an online-game item and from research, I need to do some 'web scraping' to get this data. But, the information is in a Javascript document/code, not an easily parseable HTML document like bs4 shows I can easily extract information from. So, I need to get a variable in this javascript document (contains a list of owners of the item I'm looking at) and make it into a usable list/json/string I can implement in my program. Is there a way I can do this? if so, how can I?

I've attached an image of the variable I need when viewing the page source of the site I'm on.

My current code:

from bs4 import BeautifulSoup
html = requests.get('https://www.rolimons.com/item/1029025').content #the item webpage
soup = BeautifulSoup(html, "lxml")
datas = soup.find_all("script")
print(data) #prints the sections of the website content that have ja

IMAGE LINK

CodePudding user response:

import requests
import json
import re

r = requests.get('...')
m = re.search(r'var history_data\s =\s (.*)', r.text)
print(json.loads(m.group(1)))

CodePudding user response:

To scrape javascript variable, can't use only BeautifulSoup. Regular expression (re) is required.

Use ast.literal_eval to convert string representation of dict to a dict.

from bs4 import BeautifulSoup
import requests
import re
import ast

html = requests.get('https://www.rolimons.com/item/1029025').content #the item webpage
soup = BeautifulSoup(html, "lxml")

ownership_data = re.search(r'ownership_data\s =\s .*;', soup.text).group(0)
ownership_data_dict = ast.literal_eval(ownership_data.split('=')[1].strip().replace(';', ''))
print(ownership_data_dict)

Output:

> {'id': 1029025, 'num_points': 1616, 'timestamps': [1491004800,
> 1491091200, 1491177600, 1491264000, 1491350400, 1491436800,
> 1491523200, 1491609600, 1491696000, 1491782400, 1491868800,
> 1491955200, 1492041600, 1492128000, 1492214400, 1492300800,
> 1492387200, 1492473600, 1492560000, 1492646400, 1492732800,
> 1492819200, ...}
  • Related