Want to use join command only on second / inner for loop to have all authors in one cell-CodePudding

Got 20 rows that's good, by using two lists, but still first loop keep appending itself

I want to have list of authors in one cell, for that I am using .join() command.

A little into to my code and what I am trying to accomplish:

Main link is a list of 20 items and each item has a list of 4-5 authors. First I want to iterate over links and then over the each of its items to get list of authors in one cell of csv.

It's nightmare for me. I have spent days in figuring out the answer, hopefully someone will help and understand the problem. Ask for more information, thank you. Output is attached below:

from selenium import webdriver
import pandas as pd    
driver = webdriver.Chrome()    
site = 'https://www.goodreads.com/search?q=chughtai&qid=WzdWh5nG8z'

driver.get(site)
driver.maximize_window()
authors = []
auth = []


main = driver.find_elements_by_tag_name('tr')
for i in main:
    con = i.find_elements_by_xpath('.//div[@]')
    for n in con:
        authors.append(n.find_element_by_xpath('.//a[@]/span').text)
        one_cell = ', '.join(authors)
    auth.append(one_cell)

a = {'Author Names': one_cell}
df = pd.DataFrame.from_dict(a, orient='index')
df = df.transpose()
df.to_csv("only_names.csv", index=False)
print(df)

CodePudding user response：

Seems that your problem is the author list does not reset to empty before you parse a new item. One way to reset it would be to move your authors = [] from its current position to the line right after for i in main:. Then you will get a new, empty list for each item.

Another, non-critical, suggestion is to get your one_cell = ', '.join(authors) outside of the current, inner loop but still before auth.append(one_cell). You only need to do both lines once for each i.

UPDATE:

To show my 2nd suggestion:

for i in main:
    authors = []
    con = i.find_elements_by_xpath('.//div[@]')
    for n in con:
        authors.append(n.find_element_by_xpath('.//a[@]/span').text)
    one_cell = ', '.join(authors)
    auth.append(one_cell)