PYTHON get a specific text from a text-CodePudding

I want to get a specific text from a text.

TEXT

    test="<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"><html><body><div dir="ltr"><p>test test test</p<p><ahref="https://test.com/users/confirmationconfirmation_token=XXXXXX">https://test.com/users/confirmation?confirmation_token=XXXXXX</a></p>
<p>Link ile ilgili sorun yaşıyorsanız, kopyalayıp tarayıcınıza da yapıştırabilirsiniz.</p><p>Saygılarımızla,</p<p>test test test</p></div></body></html>"

this code is string variable. not html

i want to get this text "https://test.com/users/confirmation?confirmation_token=XXXXXX" but (token=XXXXXX) this part changes every time.

Can I get only the text I mentioned above with any method? Even though I only take the xxxxx part it's enough for me

CodePudding user response：

SOLUTİON

from bs4 import BeautifulSoup as bf
x = response['items']['body']
soup = bf(x,'html.parser')
soup.body.a.text

CodePudding user response：

You can use regular expressions to solve your problem

import re

test = """<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"><html><body><div dir="ltr">
 <p>test test test</p<p><ahref="https://test.com/users/confirmationconfirmation_token=XXXXXX">https
 ://test.com/users/confirmation?confirmation_token=XXXXXX</a></p>
<p>Link ile ilgili sorun yaşıyorsanız, kopyalayıp tarayıcınıza da yapıştırabilirsiniz.</p><p>S
aygılarımızla,</p<p>test test test</p></div></body></html>"""

pattern = 'confirmation_token=(.*?)<'
find_list = re.findall(pattern, test)

print(find_list)

"""
['XXXXXX']
"""