Home > Software engineering >  How to remove and write new line while scraping
How to remove and write new line while scraping

Time:01-15

I made small python scraper which should find emails from loaded .txt file,in where are line by line url links to scrape.

Iam trying to write them to the another .text file but somehow I'm not able to write new line for each new scraped link loaded from .txt file and also there are some disturbing [' n\ this kind of characters which keep writting to the text file even they are not presented in url I scraping.

My scraper:

def scrapeEmails():
    global reqs, _lock, success, fails, rps, rpm

    with open(os.path.join("proxies.txt"), "r") as f:
        proxies = f.read().splitlines()
    with open(os.path.join("links_toscrape.txt"), "r") as f:
        channelLinks = f.read().splitlines()
    rndChannelLinks = random.choice(channelLinks)

    URL = rndChannelLinks   "/about"
    proxy = random.choice(proxies)
    proxies = {"https": "http://" proxy}

    soup = BeautifulSoup(requests.get(URL, proxies=proxies).text, "html.parser") 
    _description = soup.find("meta", property="og:description")
    _content = _description["content"] if _description else "No meta title given"

    #for s in _content:
    if "@" in _content.lower():
        __email = re.findall("([\s]{0,10}[\w.]{1,63}@[\w.]{1,63}[\s]{0,10})", _content)
        cleanEmail = [x.replace("\n", "") for x in __email]
        print("Email: ",  cleanEmail)

        with open("scraped_emails.txt") as f:
            f.write(str(cleanEmail))
            f.close()
    else:
        print("Email of YouTube channel "   URL   " not found.")  

CodePudding user response:

First of all, we can not try this code and see what's wrong because you did not share any example. So I will try to figure out by guessing.

  1. In the current code, the file is opened and closed on each iteration of the loop, Instead, open the file before the loop and close it after the loop is completed. Otherwise it will slow down to your code.

  2. Use the 'a' mode when opening the file, instead of 'w'(it's default), to append the new scraped emails to the file, rather than overwriting the existing contents.

  3. You are calling f.close() which is not needed because you are using with open statement which automatically closed the file once you exit the block.

As a result, your when you write your file, use this code:

with open("scraped_emails.txt", 'a') as f:
    for email in cleanEmail:
        f.write(email   '\n')
  • Related