Home > Software design >  Writing csv file - Python-3.x web-scrapping
Writing csv file - Python-3.x web-scrapping

Time:09-27

I'm working on web scrapping and while writing data to a csv file using following code:

path = Path.cwd() / "data.csv"
with path.open(mode='w', encoding='utf-8', newline='') as file:
    writer = csv.writer(file)
    for line in lists:
        title = line.find('a', class_='listing-search-item__link--title').text.replace('\n', '')
        writer.writerow(title)
with path.open(mode='r', encoding='utf-8', newline='') as read_file:
    read = csv.reader(read_file)
    for line in read:
        print(line)

There are some extra spaces in file, I'm unable to avoid. While printing data from file I'm getting the output:

[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', 'T', 'u', 'i', 'n', 'l', 'a', 'a', 'n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', 'B', 'u', 'r', 'g', 'e', 'm', 'e', 'e', 's', 't', 'e', 'r', ' ', 'V', 'a', 'n', ' ', 'H', 'a', 'a', 'r', 'e', 'n', 'l', 'a', 'a', 'n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', 'B', 'r', 'o', 'e', 'r', 's', 'v', 'e', 's', 't', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', "'", 's', '-', 'G', 'r', 'a', 'v', 'e', 'l', 'a', 'n', 'd', 's', 'e', 'w', 'e', 'g', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

CodePudding user response:

By my experience, your title is a string, but writerow need a iterator or list simply. So, just try

writer.writerow([title])

CodePudding user response:

You can either just strip them right after scraping, or [if you need to deal with it in list form for some reason], you can use a function like this:

def stripList(l: list, fromEnd='both'): 
  lInd = range(len(l))
  if fromEnd == 'right': 
    lInd = reversed(lInd) 
  for i in lInd:
    if str(l[i]).strip() != '':
      if fromEnd not in ['left', 'right']:
        return stripList(l[i:], 'right')
      return l[i:] if fromEnd == 'left' else l[:i 1] 
  return []

You can call it like this.

If you don't actually want to keep the lists, but that's how you're receiving the data, you can use ''.join().

CodePudding user response:

If you just want a csv document from those results (properly cleaned), why not do:

import requests
import pandas as pd
from bs4 import BeautifulSoup as bs

big_list = []
headers = {
    'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
    }
url = 'https://www.pararius.com/apartments/schiedam'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
titles = soup.select('h2.listing-search-item__title a')
for t in titles:
    big_list.append((t.get_text(strip=True),))
df = pd.DataFrame(big_list, columns=['Properties'])
df.to_csv('schiedam_rentals.csv')
print(df)

Result:

Properties
0   Apartment Korte Kerkstraat 1 B
1   Apartment Tuinlaan
2   Apartment Burgemeester Van Haarenlaan
3   Apartment Broersvest
4   Apartment 's-Gravelandseweg
5   Apartment Heijermansplein
6   Apartment Boerhaavelaan
7   Apartment Rotterdamsedijk 330
8   Apartment Newtonplein
9   Apartment Broersvest
10  Apartment Herenpad
11  House Ampèrestraat
12  Apartment Burgemeester Knappertlaan
13  Apartment Broersveld 91 A
14  Apartment 's-Gravelandseweg 1065
15  Apartment Jan Steenstraat
16  Apartment Burgemeester Knappertlaan
17  Apartment Nicolaas Beetsstraat 52 A
18  Apartment Schie 17 C
19  Apartment Frans Halsplein
20  Apartment Schiedamseweg 188
21  Apartment Tuinlaan 50 A
22  House Pascalstraat
23  Apartment Albert Cuijpstraat
24  Apartment Jan Steenstraat
25  Apartment Frans Halsplein
26  Apartment Baan 28 E
27  House Hargplein 51
28  Apartment Sint Liduinastraat
  • Related