Home > Net >  How can I write a json webpage into a json file using selenium driver.page_source?
How can I write a json webpage into a json file using selenium driver.page_source?

Time:12-31

I have found a json response of a webpage and scraped it with selenium using the code below:

from selenium import webdriver
url = "website.json"
driver.get(url)
text = driver.page_source

with open("data.json", "tw",encoding="utf-8") as html_file:
    html_file.write(text)

But when I open the file it is like this:

<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">{
  "Status": "OK",
  "TotalRows": 386,
  "Items": [
   ...
   ]
}</pre></body></html>

So the json file shows in the middle of two html tags. To solve this problem I have tried this code:

t1 = text.replace('<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">', "")
t2 = t1.replace('</pre></body></html>', "")

with open('data.json', 'w') as outfile:
    json.dump(t2, outfile, indent=2)

But when I run this, data.json contains strings like this:

"{\n  \"Status\": \"OK\",\n  \"TotalRows\": 401,\n  \"Items\": [\n  ...\n  ]\n}"

What should I do?

CodePudding user response:

You should try to find only the element you want and extract it's text:

driver.get(url)
element = driver.findElement(By.TAG_NAME, 'pre')
with open('data.json', 'w') as file:
    json.dump(element.text, file)

CodePudding user response:

It seems after cleaning the HTML tags through:

t1 = text.replace('<html><head></head><body><pre style="word-wrap: break-word; white-space: pre-wrap;">', "")
t2 = t1.replace('</pre></body></html>', "")

I need to load t2 as a json file as below:

res = json.loads(t2)
with open('data.json', 'w') as outfile:
    json.dump(res, outfile, indent=4)

This worked for me.

  • Related