I'm tring to remove the extra space and "rebtel.bootstrappedData" in the second alinea but for some reason it won't work.
This is my output
"welcome_offer_cuba.block_1_title":"SaveonrechargetoCuba","welcome_offer_cuba.block_1_cta":"Sendrecharge!","welcome_offer_cuba.block_1_cta_prebook":"Pre-bookRecarga","welcome_offer_cuba.block_1_footprint":"Offervalidfornewusersonly.","welcome_offer_cuba.block_2_key":"","welcome_offer_cuba.block_2_title":"Howtosendarecharge?","welcome_offer_cuba.block_2_content":"<ol><li>Simplyenterthenumberyou’dliketosendrechargeinthefieldabove.</li><li>Clickthe“{{buttonText}}”button.</li><li>CreateaRebtelaccountifyouhaven’talready.</li><li>Done!Yourfriendshouldreceivetherechargeshortly.</li></ol>","welcome_offer_cuba.block_3_title":"DownloadtheRebtelapp!","welcome_offer_cuba.block_3_content":"Sendno-feerechargeandenjoythebestcallingratestoCubainoneplace."},"canonical":{"string":"<linkrel=\"canonical\"href=\"https://www.rebtel.com/en/rates/\"/>"}};
rebtel.bootstrappedData={"links":{"summary":{"collection":"country_links","ids":[null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],"params":{"locale":"en"},"meta":{}},"data":[{"title":"A","links":[{"iso2":"AF","route":"afghanistan","name":"Afghanistan","url":"/en/rates/afghanistan/","callingCardsUrl":"/en/calling-cards/afghanistan/","popular":false},{"iso2":"AL","route":"albania","name":"Albania","url":"/en/rates/albania/
And this is the code I used:
import json
import requests
from bs4 import BeautifulSoup
url = "https://www.rebtel.com/en/rates/"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
x = range(132621, 132624)
script = soup.find_all("script")[4].text.strip()[38:]
print(script)
What should I add to "script" so it will remove the empty spaces?
CodePudding user response:
Original answer
You can change the definition of your script
variable by :
script = soup.find_all("script")[4].text.replace("\t", "")[38:]
It will remove all tabulations on your text and so the alineas.
Edit after conversation in the comments
You can use the following code to extract the data in json :
import json
import requests
from bs4 import BeautifulSoup
url = "https://www.rebtel.com/en/rates/"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
script = list(filter(None, soup.find_all("script")[4].text.replace("\t", "").split("\r\n")))
app_data = json.loads(script[1].replace("rebtel.appData = ", "")[:-1])
bootstrapped_data = json.loads(script[2].replace("rebtel.bootstrappedData = ", ""))
I extracted the lines of the script with split("\r\n")
and get the wanted data from there.