Here is the data I am trying to extract:
<messages>
<message type="General" code="ER">SECURITY ALERT message(s) found.</message>
<message type="General">ORDER NUMBER: 7575757</message>
</messages>
I am trying just to grab the Order Number: 7575757
I have tried several methods of getting this attribute but with no success.
First attempt:
def parseTestID(testid):
dict = {'ORDER NUMBER': testid.split(" ")[0].split(":")[0]}
return dict
parsedData= []
for element in bs_data.find_all("messages"):
for message in element.find_all("message"):
dict = {'type': message['type'], 'ORDER NUMBER': parseTestID(message.string)['ORDER NUMBER']}
# append dictionary to list
parsedData.append(dict)
# return list
print(parsedData)
Output:
[{'type': 'General', 'ORDER NUMBER': 'SECURITY'}, {'type': 'General', 'ORDER NUMBER': 'ORDER'}]
Second attempt:
for element in bs_data.find_all("messages"):
for message in element.find_all("message"):
print(message.text)
Output:
SECURITY ALERT message(s) found.
ORDER NUMBER: FA3JZ0P
I feel that I am close but not quite sure how to grab this specific attribute.
CodePudding user response:
You could get that number with the following:
from bs4 import BeautifulSoup
import re
html = '''
<messages>
<message type="General" code="ER">SECURITY ALERT message(s) found.</message>
<message type="General">ORDER NUMBER: 7575757</message>
</messages>
'''
soup = BeautifulSoup(html, 'html.parser')
desired_info = soup.find('message', string = re.compile('ORDER NUMBER:')).text.split(':')[1].strip()
print(desired_info)
This would return just the number:
7575757
If you want the full string, then you can skip splitting the string above. BeautifulSoup documentation (which I strongly reccomend perusing, at this point): https://beautiful-soup-4.readthedocs.io/en/latest/index.html
CodePudding user response:
Another solution, without re
:
num = soup.select_one('message:-soup-contains("ORDER NUMBER")').text.split()[-1]
print(num)
Prints:
7575757