Writing csv file - Python-3.x web-scrapping-CodePudding

I'm working on web scrapping and while writing data to a csv file using following code:

path = Path.cwd() / "data.csv"
with path.open(mode='w', encoding='utf-8', newline='') as file:
    writer = csv.writer(file)
    for line in lists:
        title = line.find('a', class_='listing-search-item__link--title').text.replace('\n', '')
        writer.writerow(title)
with path.open(mode='r', encoding='utf-8', newline='') as read_file:
    read = csv.reader(read_file)
    for line in read:
        print(line)

There are some extra spaces in file, I'm unable to avoid. While printing data from file I'm getting the output:

[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', 'T', 'u', 'i', 'n', 'l', 'a', 'a', 'n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', 'B', 'u', 'r', 'g', 'e', 'm', 'e', 'e', 's', 't', 'e', 'r', ' ', 'V', 'a', 'n', ' ', 'H', 'a', 'a', 'r', 'e', 'n', 'l', 'a', 'a', 'n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', 'B', 'r', 'o', 'e', 'r', 's', 'v', 'e', 's', 't', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', "'", 's', '-', 'G', 'r', 'a', 'v', 'e', 'l', 'a', 'n', 'd', 's', 'e', 'w', 'e', 'g', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']

CodePudding user response：

By my experience, your title is a string, but writerow need a iterator or list simply. So, just try

writer.writerow([title])

CodePudding user response：

You can either just strip them right after scraping, or [if you need to deal with it in list form for some reason], you can use a function like this:

def stripList(l: list, fromEnd='both'): 
  lInd = range(len(l))
  if fromEnd == 'right': 
    lInd = reversed(lInd) 
  for i in lInd:
    if str(l[i]).strip() != '':
      if fromEnd not in ['left', 'right']:
        return stripList(l[i:], 'right')
      return l[i:] if fromEnd == 'left' else l[:i 1] 
  return []

You can call it like this.

If you don't actually want to keep the lists, but that's how you're receiving the data, you can use ''.join().

CodePudding user response：

If you just want a csv document from those results (properly cleaned), why not do:

import requests
import pandas as pd
from bs4 import BeautifulSoup as bs

big_list = []
headers = {
    'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
    }
url = 'https://www.pararius.com/apartments/schiedam'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
titles = soup.select('h2.listing-search-item__title a')
for t in titles:
    big_list.append((t.get_text(strip=True),))
df = pd.DataFrame(big_list, columns=['Properties'])
df.to_csv('schiedam_rentals.csv')
print(df)

Result:

Properties
0   Apartment Korte Kerkstraat 1 B
1   Apartment Tuinlaan
2   Apartment Burgemeester Van Haarenlaan
3   Apartment Broersvest
4   Apartment 's-Gravelandseweg
5   Apartment Heijermansplein
6   Apartment Boerhaavelaan
7   Apartment Rotterdamsedijk 330
8   Apartment Newtonplein
9   Apartment Broersvest
10  Apartment Herenpad
11  House Ampèrestraat
12  Apartment Burgemeester Knappertlaan
13  Apartment Broersveld 91 A
14  Apartment 's-Gravelandseweg 1065
15  Apartment Jan Steenstraat
16  Apartment Burgemeester Knappertlaan
17  Apartment Nicolaas Beetsstraat 52 A
18  Apartment Schie 17 C
19  Apartment Frans Halsplein
20  Apartment Schiedamseweg 188
21  Apartment Tuinlaan 50 A
22  House Pascalstraat
23  Apartment Albert Cuijpstraat
24  Apartment Jan Steenstraat
25  Apartment Frans Halsplein
26  Apartment Baan 28 E
27  House Hargplein 51
28  Apartment Sint Liduinastraat