Home > Mobile >  Why KeyError: 'url' occur while reading urls list from a csv file with python?
Why KeyError: 'url' occur while reading urls list from a csv file with python?

Time:02-17

I got this KeyError while trying to read a list of Urls from CSV file with python:

        C:\Users\user\Desktop\urls>python urla.py
        Traceback (most recent call last):
          File "C:\Users\user\Desktop\urls\urla.py", line 6, in <module>
            print(row["url"])
        KeyError: 'url'

The error occurs from the snippet:

with open('myurls.csv', newline='') as csv_file:
        reader = csv.DictReader(csv_file)
        urls = [row["url"] for row in reader]

within this complete script:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
import csv




#GET TEXT
def getPageText(url):
    # given a url, get page content
    data = urlopen(url).read()
    # parse as html structured document
    soup = BeautifulSoup(data, 'html.parser')
    # kill javascript content
    for s in soup(["script", "style"]):
        s.replaceWith('')
    #
    for p in soup.find_all('p')[1]:
        lnk = p.get_text()
        print(lnk)
    #
    # find body and extract text
    p = soup.find("div", attrs={'class': 'article-content retro-folders'})
    p.append(p.get_text())
    x = p.text
    y = x.replace("\r", "").replace("\n", "")
    print(y)
    
    # Compiling the info
    lnktxt_data = [lnk, y]


    
    # Append the info to the complete dataset
    url_txt.append(lnktxt_data)

url_txt = []
    
#Get text from multiple urls    
def main():
    
    with open('myurls.csv', newline='') as csv_file:
        reader = csv.DictReader(csv_file)
        urls = [row["url"] for row in reader]
            
    txt = [getPageText(url) for url in urls]
    for t in txt:
        print(t)
    
if __name__=="__main__":
    main()
    
#FRAME DATA
# Making the dataframe
url_txt = pd.DataFrame(url_txt, columns = ['lnk', 'y'])
 
url_txt.head()
    
#CREATE A FILE
# Save as CSV File
url_txt.to_csv('url_txt.csv',index=False)

What's causing the KeyError?

CodePudding user response:

It may be possible that the row variable is not a dictionary.

CodePudding user response:

I found the cause form this post:

Why does my Python code print the extra characters "" when reading from a text file?

Basically it was due to the csv being saved with UTF-8 with BOM encoding, generating those characters url responsible for the KeyError.

To solve the problem save it to UTF-8.

Full solution details:

UTF-8, not UTF-8 with Bow to read urls in CSV in python

  • Related