I'm working on web scrapping and while writing data to a csv file using following code:
path = Path.cwd() / "data.csv"
with path.open(mode='w', encoding='utf-8', newline='') as file:
writer = csv.writer(file)
for line in lists:
title = line.find('a', class_='listing-search-item__link--title').text.replace('\n', '')
writer.writerow(title)
with path.open(mode='r', encoding='utf-8', newline='') as read_file:
read = csv.reader(read_file)
for line in read:
print(line)
There are some extra spaces in file, I'm unable to avoid. While printing data from file I'm getting the output:
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', 'T', 'u', 'i', 'n', 'l', 'a', 'a', 'n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', 'B', 'u', 'r', 'g', 'e', 'm', 'e', 'e', 's', 't', 'e', 'r', ' ', 'V', 'a', 'n', ' ', 'H', 'a', 'a', 'r', 'e', 'n', 'l', 'a', 'a', 'n', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', 'B', 'r', 'o', 'e', 'r', 's', 'v', 'e', 's', 't', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
[' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ', 'A', 'p', 'a', 'r', 't', 'm', 'e', 'n', 't', ' ', "'", 's', '-', 'G', 'r', 'a', 'v', 'e', 'l', 'a', 'n', 'd', 's', 'e', 'w', 'e', 'g', ' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
CodePudding user response:
By my experience, your title
is a string, but writerow
need a iterator
or list
simply. So, just try
writer.writerow([title])
CodePudding user response:
You can either just strip
them right after scraping, or [if you need to deal with it in list form for some reason], you can use a function like this:
def stripList(l: list, fromEnd='both'):
lInd = range(len(l))
if fromEnd == 'right':
lInd = reversed(lInd)
for i in lInd:
if str(l[i]).strip() != '':
if fromEnd not in ['left', 'right']:
return stripList(l[i:], 'right')
return l[i:] if fromEnd == 'left' else l[:i 1]
return []
You can call it like this.
If you don't actually want to keep the lists, but that's how you're receiving the data, you can use ''.join()
.
CodePudding user response:
If you just want a csv document from those results (properly cleaned), why not do:
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
big_list = []
headers = {
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36"
}
url = 'https://www.pararius.com/apartments/schiedam'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
titles = soup.select('h2.listing-search-item__title a')
for t in titles:
big_list.append((t.get_text(strip=True),))
df = pd.DataFrame(big_list, columns=['Properties'])
df.to_csv('schiedam_rentals.csv')
print(df)
Result:
Properties
0 Apartment Korte Kerkstraat 1 B
1 Apartment Tuinlaan
2 Apartment Burgemeester Van Haarenlaan
3 Apartment Broersvest
4 Apartment 's-Gravelandseweg
5 Apartment Heijermansplein
6 Apartment Boerhaavelaan
7 Apartment Rotterdamsedijk 330
8 Apartment Newtonplein
9 Apartment Broersvest
10 Apartment Herenpad
11 House Ampèrestraat
12 Apartment Burgemeester Knappertlaan
13 Apartment Broersveld 91 A
14 Apartment 's-Gravelandseweg 1065
15 Apartment Jan Steenstraat
16 Apartment Burgemeester Knappertlaan
17 Apartment Nicolaas Beetsstraat 52 A
18 Apartment Schie 17 C
19 Apartment Frans Halsplein
20 Apartment Schiedamseweg 188
21 Apartment Tuinlaan 50 A
22 House Pascalstraat
23 Apartment Albert Cuijpstraat
24 Apartment Jan Steenstraat
25 Apartment Frans Halsplein
26 Apartment Baan 28 E
27 House Hargplein 51
28 Apartment Sint Liduinastraat