I'm trying to copy search result tables from a website into an excel sheet But the data isn't very clean and it's causing some issues with outputting the lists into a pandas dataframe.
There is 15 columns of data, but the last 2 have blank headers and 2 of them have duplicate headers. Which (I think) is causing me to get this error "ValueError: All arrays must be of the same length"
Realistically I only need the first 9 columns of the table which in this case means there won't be any duplicate or blank headers in the data anymore.
Is there a way to limit find_elements to get the first 9 columns of the table rather than all columns? Or to fix the headers so that there are no longer any duplicates or blanks?
Any help is greatly appreciated thank you.
for x in result:
driver.find_element(By.XPATH, '//*[@id="sidemenu"]/table/tbody/tr[1]/td/form/div[2]/input[1]').send_keys(x)
driver.implicitly_wait(2)
driver.find_element(By.XPATH, '//*[@id="navsrch"]').click()
driver.implicitly_wait(2)
headers = []
columns = dict()
table_id = driver.find_element(By.ID, 'invoice')
all_rows = table_id.find_elements(By.TAG_NAME, "tr")
row = all_rows[0]
all_items = row.find_elements(By.TAG_NAME, "th")
for item in all_items:
name = item.text
columns[name] = []
headers.append(name)
print(headers)
for row in all_rows[1:]:
all_items = row.find_elements(By.TAG_NAME, "td")
for name, item in zip(headers, all_items):
value = item.text
columns[name].append(value)
print(columns)
df = pd.DataFrame(columns)
print(df)
driver.close()
CodePudding user response:
Fixed issue by looping through the specificed amount of times that I needed and iterating the xpath position each loop and sending the data straight to excel on each loop. Thanks for your help Aaron for the iterating idea & Arundeep for the xpath position idea.
for x in result:
driver.find_element(By.XPATH, '//*[@id="sidemenu"]/table/tbody/tr[1]/td/form/div[2]/input[1]').send_keys(x)
driver.implicitly_wait(2)
driver.find_element(By.XPATH, '//*[@id="navsrch"]').click()
driver.implicitly_wait(5)
row = driver.find_elements(By.XPATH, '//*[@id="invoice"]/tbody/tr')
lrow = len(row) 1
for i in range(1, lrow):
l_e_r = len(list(ws.rows)) 1
for y in range(1, 10):
rows = driver.find_element(By.XPATH, '//*[@id="invoice"]/tbody/tr[' str(i) ']/td[' str(y) ']').text
ws.cell(row=l_e_r, column=y).value = rows
wb.save("test.xlsx")
driver.close()