Here is the code we're working with; basically just takes data from multiple scrapped datasets and then concatenates them.
import pandas as pd
import numpy as np # for numeric python functions
from pylab import * # for easy matplotlib plotting
from bs4 import BeautifulSoup
import requests
df1 = pd.read_html(url1)
#the table works - now lets make it look at change owned to find the largest value
n = np.quantile(table['Qty'], [0.50])
print("99th percentile: ",n)
q=table.sort_values('Qty', ascending = False)
page = requests.get(url1)
name=q['Ticker'].str.replace('\d ', '')
name1 = (table['Ticker'])
n = name1.count()
#Buyers for the company
All = []
url = ''
for entry in name1:
table2 = pd.read_html(url entry)
All = pd.concat(All)
print(All.columns)#<- my sanity check
print(All['Insider Name'])#<- where the problem lies
Now if you look at the concatenated dataset, you'll see the "Insider Name" column. I want to isolate this column, but when I do, python says:
KeyError: 'Insider Name'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/ in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 'Insider Name'
So the column exists, but it also doesn't? Any tips would be greatly appreciated! Thanks in advance!
CodePudding user response:
The problem is that the character between Insider & Name is not 'space'. Try:
This will fix the issue:
All.rename(columns={"Insider\xa0Name": "Insider Name"}, inplace=True)