Csv files only store a single row of data and if i use range into csv then it execute only one lines again and again until fulfilments of range.
I can't fix this bugs,i took my 2 days.
for page in range(0,10):
url = "https://cryptonews.net/?page={page}".format(page =page)
# print(url)
# open the file in the write mode
# f = open('file.csv', 'w',newline='' )
header = ['Title', 'Tag', 'UTC','Web_Address']
# write a row to the csv file
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
lists = soup.find_all("main")
for lis in lists:
title = lis.find('a', class_="title").text
tag = lis.find('span', class_="etc-mark").text
datetime = lis.find('span', class_="datetime").text
address = lis.find('div', class_="middle-xs").text
img = lis.find('span', class_="src")
data =([title, tag, datetime,address,img])
counter = range(100)
with open('crypto.csv', 'a', newline='') as crypto:
FileWriter = csv.writer(crypto)
FileWriter.writerow(header)
for x in counter:
FileWriter.writerow(data)# writer.writerows(data)
CodePudding user response:
You aren't storing the data, and as stated, it's overwritten each time you iterate through the lists
. Secondly, I'd opt to use pandas here to create a dataframe, then just write that to file.
Also, you collect 5 items to write, and only have 4 column names.
import pandas as pd
import requests
from bs4 import BeautifulSoup
data = []
for page in range(0,10):
print(page)
url = "https://cryptonews.net/?page={page}".format(page =page)
# print(url)
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
lists = soup.find_all("main")
for lis in lists:
title = lis.find('a', class_="title").text
tag = lis.find('span', class_="etc-mark").text
datetime = lis.find('span', class_="datetime").text
address = lis.find('div', class_="middle-xs").text
img = lis.find('span', class_="src")
data.append([title, tag, datetime,address,img])
header = ['Title', 'Tag', 'UTC','Web_Address','Image']
df = pd.DataFrame(data, columns=header)
df.to_csv('crypto.csv', index=False)
Also, I'm nost sure what you want as the output (as you don't say). Is this more accurate?
import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
data = []
for page in range(0,10):
print(page)
url = "https://cryptonews.net/?page={page}".format(page =page)
# print(url)
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
lists = soup.find_all("div", {'class':re.compile('^row news-item.*')})
for lis in lists:
title = lis['data-title']
tag = lis.find('span', class_="etc-mark").text
datetime = lis.find('span', class_=re.compile("^datetime")).text.strip()
address = lis['data-domain']
img = lis['data-image']
data.append([title, tag, datetime,address,img])
header = ['Title', 'Tag', 'UTC','Web_Address','Image']
df = pd.DataFrame(data, columns=header)
df.to_csv('crypto.csv', index=False)
Output:
print(df)
Title ... Image
0 ETH Breaches $1,500 Level As Ethereum Adds Ove... ... https://cnews24.ru/uploads/e29/e29a5677e448f6e...
1 India Seeing Spike in Drug Smuggling Using Cry... ... https://cnews24.ru/uploads/65b/65b50302f65e12c...
2 Optimism (OP) Price Prediction: 87% Rally Is J... ... https://cnews24.ru/uploads/5e1/5e1189bbb2c1e2b...
3 Mysterious Whale Adds 3.94 Trillion Shiba Inu ... https://cnews24.ru/uploads/54a/54af6726248c29a...
4 Are the big fundraising efforts of blockchain ... ... https://cnews24.ru/uploads/5af/5afb066d81be4a6...
.. ... ... ...
195 Terra Classic (LUNC) Chief Community Officer S... ... https://cnews24.ru/uploads/a53/a53fd4206ab5f95...
196 Reddit NFT Collection: How to Sell Your Avatar... ... https://cnews24.ru/uploads/ab6/ab6718f707c3428...
197 In Topsy Turvy Market Logic, Positive U.S. GDP... ... https://cnews24.ru/uploads/264/264ab9327f4774a...
198 XRP Wallets Spikes Above 4.34M, Gaining 29,883... ... https://cnews24.ru/uploads/2e5/2e56d092b7c253b...
199 Are crypto trading bots legit? ... https://cnews24.ru/uploads/ccb/ccb73d9d9b79280...
[200 rows x 5 columns]
CodePudding user response:
First, you are setting data =([title, tag, datetime,address,img]) on every loop iteration but not saving it anywhere. The value of data is getting replaced on each loop iteration with the data from the next row, and you are not saving the entire dataset anywhere.
Then, you are passing the same thing ("data") to FileWriter.writerow() on every loop iteration, without ever changing the value of "data." You need to write the specific row for each loop iteration.
Fix both these issues and your code should work.