I intend to remove symbol in my CSV file that I just created from web-scraping method. To put into a context, my coordinates contain degree symbol and I want to remove it.
Here is my code:
#import modules
import requests
import urllib.request
from bs4 import BeautifulSoup
from datetime import datetime
import time
import csv
import os
import re
from selenium import webdriver
import schedule
try:
def retrieve_website():
# Create header
headers = {'user-agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'}
# URL of the ship you want to track, execute the request and parse it to the variable 'soup'
url = 'https://website-'
reqs = requests.get(url, headers=headers)
soup = BeautifulSoup(reqs.text, 'lxml')
# Save file to local disk
with open("output1.html", "w", encoding='utf-8') as file:
file.write(str(soup))
# open file to local disk
with open("output1.html", "r", encoding='utf-8') as file:
soup = BeautifulSoup(file, 'lxml')
# All td tags are read into a list
data = soup.find_all('td')
# Extract the coordinates
Longitude = data[23].get_text()
Latitude = data[24].get_text()
# Extract heading
Heading = data[27].get_text()
#save as location
dwnpath = r'S:\location'
# Write data to a csv file with comma as seperator
with open(os.path.join(dwnpath, 'Track.csv'), 'w', newline='') as csv_file:
fieldnames = ['Longitude', 'Latitude', 'Heading']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=',')
writer.writeheader()
writer.writerow({'Longitude': Longitude,
'Latitude': Latitude,
'Heading': Heading})
# Start the funtion the first time when the program starts
retrieve_website()
except Exception as error:
print(error)
print('Script Complete!')
Above is my code which about scrapping some information from specific website. I retrieved coordinates. it looks like this:
Longitude Latitude Heading
1234° 456° 789°
But I want to be like this:
Longitude Latitude Heading
1234 456 789
Thanks.
CodePudding user response:
This should do the trick!
...
writer.writerow({
'Longitude': Longitude.replace('°', ''),
'Latitude': Latitude.replace('°', ''),
'Heading': Heading.replace('°', ''),
})
...
CodePudding user response:
Other answers work too, however to generalize the solution, you can use ReGeX to remove any non-alphanumerical characters.
import re
s = "1°°23%%&&**!!"
numeric_string = re.sub("[^0-9]", "", s)
Which results in:
>> 123
CodePudding user response:
Have you tried str.replace
? Let's say you have a string '1260°':
s='1260°'
this:
s.replace('°', '')
will return '1260'