I'm trying to open links in my dataframe using selenium webdriver, the dataframe 'df1' looks like this:
user | repo1 | repo2 | repo3 | |
---|---|---|---|---|
0 | breed | cs149-f22 | kattis2canvas | grpc-maven-skeleton |
1 | GrahamDumpleton | mod_wsgi | wrapt | NaN |
The links I want to open include the content in column 'user' and one of 3 'repo' columns. I encounter a bug when I iterate the 'repo' columns.
Could anyone help me out? Thank you!
Here is my best try:
repo_cols = [col for col in df1.columns if 'repo' in col]
for index, row in df1.iterrows():
user = row['user']
for repo_name in repo_cols:
try:
repo = row['repo_name']
current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
driver.get(current_url)
time.sleep(0.5)
except:
pass
Here is the bug I encounter:
KeyError: 'repo_name'
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3079 try:
-> 3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'repo_name'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-50-eb068230c3fd> in <module>
4 user = row['user']
5 for repo_name in repo_cols:
----> 6 repo = row['repo_name']
7 current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
8 driver.get(current_url)
~\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
851
852 elif key_is_scalar:
--> 853 return self._get_value(key)
854
855 if is_hashable(key):
~\anaconda3\lib\site-packages\pandas\core\series.py in _get_value(self, label, takeable)
959
960 # Similar to Index.get_value, but we do not fall back to positional
--> 961 loc = self.index.get_loc(label)
962 return self.index._get_values_for_loc(self, loc, label)
963
~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
3080 return self._engine.get_loc(casted_key)
3081 except KeyError as err:
-> 3082 raise KeyError(key) from err
3083
3084 if tolerance is not None:
KeyError: 'repo_name'
CodePudding user response:
I think you should remove the quotation mark on the:
repo = row['repo_name']
It should be:
repo = row[repo_name]
CodePudding user response:
You're getting the KeyError
because there is no column named repro_name
.
You need to replace row['repo_name']
with row[repo_name]
.
Try this :
import pandas as pd
from selenium import webdriver
df1= pd.DataFrame({'user': ['breed', 'GrahamDumpleton'],
'repo1': ['cs149-f22', 'mod_wsgi'],
'repo2': ['kattis2canvas', 'wrapt']})
repo_cols = [col for col in df1.columns if 'repo' in col]
for index, row in df1.iterrows():
user = row['user']
for repo_name in repo_cols:
try:
repo = row[repo_name]
browser=webdriver.Chrome()
current_url = f'https://github.com/{user}/{repo}/graphs/contributors'
browser.get(current_url)
time.sleep(0.5)
except:
pass