Home > Blockchain >  Remove symbol in CSV
Remove symbol in CSV

Time:06-17

I intend to remove symbol in my CSV file that I just created from web-scraping method. To put into a context, my coordinates contain degree symbol and I want to remove it.

Here is my code:

#import modules
import requests
import urllib.request
from bs4 import BeautifulSoup
from datetime import datetime
import time
import csv
import os
import re
from selenium import webdriver
import schedule

try:      
    def retrieve_website():
        # Create header
        headers = {'user-agent': 'Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17'}

        # URL of the ship you want to track, execute the request and parse it to the variable 'soup'
        url = 'https://website-'
        reqs = requests.get(url, headers=headers)
        soup = BeautifulSoup(reqs.text, 'lxml')

        # Save file to local disk
        with open("output1.html", "w", encoding='utf-8') as file:
            file.write(str(soup))

        # open file to local disk
        with open("output1.html", "r", encoding='utf-8') as file:
            soup = BeautifulSoup(file, 'lxml')

        # All td tags are read into a list
        data = soup.find_all('td')

        # Extract the coordinates
        Longitude = data[23].get_text()
        Latitude = data[24].get_text()
    
        # Extract heading
        Heading = data[27].get_text()
        
        #save as location
        dwnpath = r'S:\location'
        
        # Write data to a csv file with comma as seperator    
        with open(os.path.join(dwnpath, 'Track.csv'), 'w', newline='') as csv_file:
            fieldnames = ['Longitude', 'Latitude', 'Heading']
            writer = csv.DictWriter(csv_file, fieldnames=fieldnames, delimiter=',')
            writer.writeheader()
            writer.writerow({'Longitude': Longitude, 
                             'Latitude': Latitude,
                              'Heading': Heading})
    
    # Start the funtion the first time when the program starts
    retrieve_website()
    
except Exception as error:
    print(error)
    
print('Script Complete!')

Above is my code which about scrapping some information from specific website. I retrieved coordinates. it looks like this:

Longitude Latitude Heading
1234°     456°     789°

But I want to be like this:

Longitude Latitude Heading
1234      456      789

Thanks.

CodePudding user response:

This should do the trick!

...
writer.writerow({
    'Longitude': Longitude.replace('°', ''), 
    'Latitude': Latitude.replace('°', ''),
    'Heading': Heading.replace('°', ''),
})
...    

CodePudding user response:

Other answers work too, however to generalize the solution, you can use ReGeX to remove any non-alphanumerical characters.

import re
s = "1°°23%%&&**!!"
numeric_string = re.sub("[^0-9]", "", s)

Which results in:

>> 123

CodePudding user response:

Have you tried str.replace? Let's say you have a string '1260°':

 s='1260°'

this:

 s.replace('°', '') 

will return '1260'

  • Related