Home > database >  using loop to iterate the html table values python
using loop to iterate the html table values python

Time:07-12

I'm trying to make python script that gets all the grades of the student using requests and bs4. Now i have a problem looping the values

   for rows in tr:
        td = tbody.find_all('td')
        subject.append(td[0].get_text())
        fq.append(td[1].get_text())
        sq.append(td[2].get_text())
        ave.append(td[3].get_text())
            
    for i in subject:
        print(f"Subject: {i}")

    for i in fq:
        print(f"First Quarter: {i}")
    
    for i in sq:
        print(f"Second Quarter: {i}")

    for i in ave:
        print(f"Average: {i}")
# here my goal is there are 4 list and are all connected like all the first value of the subject list, f_quar, s_quar and the average are linked together, like gen math(subject), 90(f_qaur), 90(s_qaur), and 90(average)

Output:

Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS 
Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS
Subject:  GENERAL MATHEMATICS
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00
First Quarter:   ##.00 
Second Quarter:   ##.00
Second Quarter:   ##.00
Second Quarter:   ##.00
Second Quarter:   ##.00
Second Quarter:   ##.00
Second Quarter:   ##.00
Second Quarter:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00
Average:   ##.00

Expected Output:

Subject: Gen Math
Subject: Stats
...

First Quarter: 90.00
First Quarter: 90.00
...
Second Quarter: 90.00
Second Quarter: 90.00
...
Average: 90.00
Average: 90.00
...

Im new at pyton so loops is my weakness. Also the code seems so wrong since i need the subject, 1stQ grade, 2ndQ grade and the average. Thanks!. This is the html code of the table:

<table cellspacing="0"  id="tblss1" width="100%">
<thead>
<tr >
<th style="text-align:center">SUBJECT</th>
<th style="text-align:center">1ST</th>
<th style="text-align:center">2ND</th>
<th style="text-align:center">AVE</th>
</tr>
</thead>
<tbody>
<tr>
<td style="color:purple"> GENERAL MATHEMATICS </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> EARTH SCIENCE </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> PHYSICAL EDUCATION AND HEALTH </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.50  </strong></td>
</tr>
<tr>
<td style="color:purple"> GENERAL CHEMISTRY 1 </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> 21ST CENTURY LITERATURE </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> READING AND WRITING </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> GENERAL BIOLOGY 1 </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.00  </strong></td>
</tr>
<tr>
<td style="color:purple"> ENTREPRENEURSHIP </td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center"> <strong> ##.00 </strong></td>
<td align="center" style="color:blueviolet"> <strong> ##.50  </strong></td>
</tr>
</tbody>
</table>

CodePudding user response:

Based on my understanding, here is what you're trying to acheive. I'm assuming the first for loops actually adds all the data properly.

subjects = [] 
fq = []
sq = []
avgs = []
for rows in tr:
    td = tbody.find_all('td')
    subjects.append(td[0].get_text())
    fq.append(td[1].get_text())
    sq.append(td[2].get_text())
    avgs.append(td[3].get_text())

for subject in subjects:
    print(subject)

for f in fq:
    print(f)

for s in sq:
    print(s)

for a in avgs:
    print(a)

CodePudding user response:

You use i as index twice (outer and inner loop).

I am not sure if the interpreter can handle that "override" of the variable so easily, because it might do it but after returning to the outer loop the object/iterator-cursor in i could be gone.

Try changing the inner loop index variable name to not override i from the outer loop.

If this does not solve your issue please describe in more detail what you try to achieve or what the seen behavior is.

*Post Edit: This way you will only get the same results for all entrys. You need to build a double loop doing the following steps:

  1. find all tr blocks and iterating over them

for tr_block in tbody.find_all('tr')

  1. in each tr_block append the corresponding td blocks to their lists

td = tr_block.find_all('td')

subject.append(td[0].get_text()) #[...]

  1. after that you should have lists filled with all data from the html which you then can zip together to sets if needed.

CodePudding user response:

In cases like this, it's simpler and faster to read the table into a dataframe:

import pandas as pd
table = """[your html above]"""
print(pd.read_html(table)

Output:

SUBJECT    1ST    2ND    AVE
 0            GENERAL MATHEMATICS  ##.00  ##.00  ##.00
 1                  EARTH SCIENCE  ##.00  ##.00  ##.00
 2  PHYSICAL EDUCATION AND HEALTH  ##.00  ##.00  ##.50
 3            GENERAL CHEMISTRY 1  ##.00  ##.00  ##.00

etc.

  • Related