Home > Software engineering >  Python: How to assign column titles to each iteration of a for loop from an existing array
Python: How to assign column titles to each iteration of a for loop from an existing array

Time:01-08

I am a beginner. I have a static array from which I'd like to pull its variables successively and assign them as the column title to each iteration of a for loop. For example, after the first loop, assign the first variable in the col_titles as the column title. After the second loop, assign the second variable in the col_titles as the column title, and so on. Here's what I have going so far:


data = []

col_titles = ['30024`, '30033', '30038']

urls = [
'https://www.example.com/page1',
'https://www.example.com/page2',
'https://www.example.com/page3
]

counter = 1

for url in urls:
    driver.get(url)
    h2s = driver.find_elements(By.TAG_NAME, 'h2')
    try:
        for h2 in h2s:
            if counter <= 5:
                data.append(h2.get_attribute("innerText"))
                counter = counter   1        
    except (ElementNotVisibleException, NoSuchElementException):
        data.append("None")
    driver.close()    

print(data)

Currently, the output is an array containing all the variables from each loop like so (with each h2 reflecting unique h2 titles from each url):

[h2, h2, h2, h2, h2, h2, h2, h2, None, None, h2, h2, h2, h2, None]

This is fine, as all I've done is append each iteration to the "data" array.

This is where I get stuck.

I think I should be creating a DataFrame within the for loop to grab a column title from the "col_titles" array, assigning it as a column title following (or preceding) each iteration of the for loop, but I don't know how to do this properly. What I'm hoping to achieve is an output like the following:

30024   30033   30038
h2      h2      h2
h2      h2      h2
h2      h2      h2
h2      None    h2
h2      None    None

Any insight is very appreciated!

CodePudding user response:

First you create dictionary, and add key from col_titles and assign value from each iteration which you get a list. And zip dictionary to dataframe- Code will be something like -

col_titles = ['30024`, '30033', '30038']

urls = [
'https://www.example.com/page1',
'https://www.example.com/page2',
'https://www.example.com/page3
]

counter = 1
ctr = 0
my_dict={}

for url in urls:
    driver.get(url)
    h2s = driver.find_elements(By.TAG_NAME, 'h2')
    data = []
    try:
        for h2 in h2s:
            if counter <= 5:
                data.append(h2.get_attribute("innerText"))
                counter = counter   1 
    except (ElementNotVisibleException, NoSuchElementException):
        data.append("None")
    driver.close()    
    ctr = ctr   1 
    my_dict[col_titles[ctr]] = data
df = pd.DataFrame(my_dict)
print(df)

CodePudding user response:

Use collections.defaultdict and zip function.
To get the result which is then passed to pandas DataFrame as columns/values it's more convenient in your case to use a dictionary-like data structure.

Instead of data = [] initialize:

from collections import defaultdict

data = defaultdict(list)

Then you iterate over your urls and accumulate values for each column separately:

for col, url in zip(*[col_titles, urls]): 
    driver.get(url)
    h2s = driver.find_elements(By.TAG_NAME, 'h2')
    try:
        for h2 in h2s:
            if counter <= 5:
                data[col].append(h2.get_attribute("innerText"))
                counter = counter   1        
    except (ElementNotVisibleException, NoSuchElementException):
        data[col].append("None")
    driver.close()

Eventually, when generating dataframe as pd.DataFrame(data) you'll get a structure like (similar) this:

  30024 30033 30038
0    h2    h2    h2
1    h2    h2    h2
2    h2    h2    h2
3    h2    h2    h2
4  None  None  None
  • Related