I want to obtain the value of the comment from the comment box, which is only displayed if a user has left a comment.
<div data-v-410f78e0="" ><span data-v-410f78e0="" >Message: </span> <div data-v-410f78e0="" ><div data-v-410f78e0="" ><span data-v-410f78e0=""> 603-779 852</span></div> <!----></div></div>
I'm trying to get 603-779 852
from there. I tried this
from bs4 import BeautifulSoup
#Parse the HTML using Beautiful Soup
soup = BeautifulSoup(html, "html.parser")
#Find the element containing the string you want to extract
element = soup.find("div", class_="comment-desc comment-desc-inline")
#Extract the string from the element and remove any leading or trailing white space
string = element.text.strip()
#Remove the characters " ", "-", and space from the string
modified_string = string.replace(" ", "").replace("-", "").replace(" ", "")
#Slice the first character (index 0) from the modified string and remove it if it contains the character "6"
first_char = modified_string[0:1].replace("6", "")
#Verify that the resulting string starts with the character "0"
if first_char.startswith("0"):
final_string = first_char modified_string[1:]
else:
final_string = modified_string
#Print the final string
print(final_string)
CodePudding user response:
I can guarantee you this isn't the best way to do this. But only half my brain is working today, so this is all I've got.
As they say "If it's stupid, but works, it ain't stupid".
To fix your code, you can simply change:
string = element.text.strip()
To:
string = element.contents[0].contents[0]
Complete Code:
from bs4 import BeautifulSoup as bs
html = """<div data-v-410f78e0="" ><span data-v-410f78e0="" >Message: </span> <div data-v-410f78e0="" ><div data-v-410f78e0="" ><span data-v-410f78e0=""> 603-779 852</span></div> <!----></div></div>"""
soup = bs(html, "html.parser")
#Find the element containing the string you want to extract
element = soup.find("div", class_="comment-desc comment-desc-inline")
#Extract the string from the element and remove any leading or trailing white space
string = element.contents[0].contents[0]
#Remove the characters " ", "-", and space from the string
modified_string = string.replace(" ", "").replace("-", "").replace(" ", "")
#Slice the first character (index 0) from the modified string and remove it if it contains the character "6"
first_char = modified_string[0:1].replace("6", "")
#Verify that the resulting string starts with the character "0"
if first_char.startswith("0"):
final_string = first_char modified_string[1:]
else:
final_string = modified_string
#Print the final string
print(final_string)
Will output:
603779852