Hi please anyone can help me with this list i want to separate the data into three parts, the whole data below is located at a single index of the list, such that each index of the list has a data of this kind.
[website='https://stackoverflow.com/questions/20084356/python-3-email-extracting-search-engine'
page_url='https://stackoverflow.com/questions/20084356/python-3-email-extracting-search-engine?answertab=active#tab-top'
data={'email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']}
]
such that i'll be able to fetch the datas independently as:
website='https://stackoverflow.com/questions/20084356/python-3-email-extracting-search-engine'
page_url='https://stackoverflow.com/questions/20084356/python-3-email-extracting-search-engine?answertab=active#tab-top'
data={'email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']}
In my script i've being able to convert the list to string and split it again but i'm not getting the right answer
from extract_emails import DefaultFilterAndEmailAndLinkedinFactory as Factory
from extract_emails import DefaultWorker
from extract_emails.browsers.requests_browser import RequestsBrowser as Browser
browser = Browser()
print('Scraping.....')
# url = 'https://en.wikipedia.org/'
url = 'https://stackoverflow.com/questions/20084356/python-3-email-extracting-search-engine'
factory = Factory(website_url=url, browser=browser, depth = 1, max_links_from_page=5)
worker = DefaultWorker(factory)
data = worker.get_data()
# ------------convert the data to a string----------#
part1 = str(data[3])
print(part1)
#-convert string to a list------#
list1 = list(part1.split())
print(list1)
#-------------------------#
value1 = list1[0]
value2 = list1[1]
value3 = list1[2]
print(value1)
print(value2)
print(value3)
but after implementing this logic above i'm getting the result as this, which is cutting off the email part which i need:
website='https://stackoverflow.com/questions/20084356/python-3-email-extracting-search-engine'
page_url='https://stackoverflow.com/questions/20084356/python-3-email-extracting-search-engine?answertab=active#tab-top'
data={'email':
CodePudding user response:
Start with splitlines()
to split the original data into lines. Then split each line at the =
delimiters.
Use ast.literal_eval()
to parse the strings and dictionaries into Python objects.
import ast
lines = part1.splitlines()
website = ast.literal_eval(lines[0].split('=', 1)[1])
page_url = ast.literal_eval(lines[1].split('=', 1)[1])
data = ast.literal_eval(lines[2].split('=', 1)[1])
CodePudding user response:
splitting based on website, page_url,data.
from extract_emails import DefaultFilterAndEmailAndLinkedinFactory as Factory
from extract_emails import DefaultWorker
from extract_emails.browsers.requests_browser import RequestsBrowser as Browser
browser = Browser()
print('Scraping.....')
# url = 'https://en.wikipedia.org/'
url = 'https://stackoverflow.com/questions/20084356/python-3-email-extracting-search-engine'
factory = Factory(website_url=url, browser=browser, depth = 1, max_links_from_page=5)
worker = DefaultWorker(factory)
data = worker.get_data()
# ------------convert the data to a string----------#
part1 = str(data[3])
#list
strs_list=[]
web=part1.split("page_url")[0]
strs_list.append(web)
part1=part1.replace(web,"")
page_url=part1.split("data")[0]
strs_list.append(page_url)
part1=part1.replace(page_url,"")
strs_list.append(part1)
for i in strs_list:
print(i)
***Output:***
Scraping.....
website='https://stackoverflow.com/questions/20084356/python-3-email-extracting-search-engine'
page_url='https://stackoverflow.com/questions/20084356/python-3-email-extracting-search-engine?answertab=active#tab-top'
data={'email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']}