Python: How to assign column titles to each iteration of a for loop from an existing array-CodePudding

I am a beginner. I have a static array from which I'd like to pull its variables successively and assign them as the column title to each iteration of a for loop. For example, after the first loop, assign the first variable in the col_titles as the column title. After the second loop, assign the second variable in the col_titles as the column title, and so on. Here's what I have going so far:


data = []

col_titles = ['30024`, '30033', '30038']

urls = [
'https://www.example.com/page1',
'https://www.example.com/page2',
'https://www.example.com/page3
]

counter = 1

for url in urls:
    driver.get(url)
    h2s = driver.find_elements(By.TAG_NAME, 'h2')
    try:
        for h2 in h2s:
            if counter <= 5:
                data.append(h2.get_attribute("innerText"))
                counter = counter   1        
    except (ElementNotVisibleException, NoSuchElementException):
        data.append("None")
    driver.close()    

print(data)

Currently, the output is an array containing all the variables from each loop like so (with each h2 reflecting unique h2 titles from each url):

[h2, h2, h2, h2, h2, h2, h2, h2, None, None, h2, h2, h2, h2, None]

This is fine, as all I've done is append each iteration to the "data" array.

This is where I get stuck.

I think I should be creating a DataFrame within the for loop to grab a column title from the "col_titles" array, assigning it as a column title following (or preceding) each iteration of the for loop, but I don't know how to do this properly. What I'm hoping to achieve is an output like the following:

30024   30033   30038
h2      h2      h2
h2      h2      h2
h2      h2      h2
h2      None    h2
h2      None    None

Any insight is very appreciated!

CodePudding user response：

First you create dictionary, and add key from col_titles and assign value from each iteration which you get a list. And zip dictionary to dataframe- Code will be something like -

col_titles = ['30024`, '30033', '30038']

urls = [
'https://www.example.com/page1',
'https://www.example.com/page2',
'https://www.example.com/page3
]

counter = 1
ctr = 0
my_dict={}

for url in urls:
    driver.get(url)
    h2s = driver.find_elements(By.TAG_NAME, 'h2')
    data = []
    try:
        for h2 in h2s:
            if counter <= 5:
                data.append(h2.get_attribute("innerText"))
                counter = counter   1 
    except (ElementNotVisibleException, NoSuchElementException):
        data.append("None")
    driver.close()    
    ctr = ctr   1 
    my_dict[col_titles[ctr]] = data
df = pd.DataFrame(my_dict)
print(df)

CodePudding user response：

Use collections.defaultdict and zip function.
To get the result which is then passed to pandas DataFrame as columns/values it's more convenient in your case to use a dictionary-like data structure.

Instead of data = [] initialize:

from collections import defaultdict

data = defaultdict(list)

Then you iterate over your urls and accumulate values for each column separately:

for col, url in zip(*[col_titles, urls]): 
    driver.get(url)
    h2s = driver.find_elements(By.TAG_NAME, 'h2')
    try:
        for h2 in h2s:
            if counter <= 5:
                data[col].append(h2.get_attribute("innerText"))
                counter = counter   1        
    except (ElementNotVisibleException, NoSuchElementException):
        data[col].append("None")
    driver.close()

Eventually, when generating dataframe as pd.DataFrame(data) you'll get a structure like (similar) this:

  30024 30033 30038
0    h2    h2    h2
1    h2    h2    h2
2    h2    h2    h2
3    h2    h2    h2
4  None  None  None