Home > database >  I need to pass the result of soup.find_all to another soup.find_all function to filter the HTML code
I need to pass the result of soup.find_all to another soup.find_all function to filter the HTML code

Time:07-20

I have this HTML code for example:

                    <table >
                    <tr>
                        <td colspan="1"></td>
                        <td colspan="2">
                            <h2  id="govtMsg" visible="false"></h2>
                        </td>
                        <td colspan="2">
                            <h2 > Net Metering Conn. </h2>
                        </td>
                        <td colspan="2">
                            <h2  hidden> Life Line Consumer</h2>
                        </td>
                    </tr>
                    <tr>
                        <td colspan="2">
                            <p style="margin: 0; text-align: left; padding-left: 5px">
                                <span>NAME & ADDRESS</span>
                                <br />
                                <span>MUHAMMAD AMIN                 </span>
                                <br />
                                <span>S/O MUHAMMAD KHAN             </span>
                                <br />
                                <span>H-NO.38 MARGALLA ROAD         </span>
                                <br />
                                <span>F-6/3 ISLAMABAD3              </span>
                                <br />
                                <span></span>
                                
                                
                            </p>
                        </td>
                        <td colspan="3" style="text-align: left">
                            <h2 >Say No To Corruption</h2>
                            

                            <span style="font-size: 8pt; color: #78578e"> MCO Date : 10-Aug-2018</span>
                            <br />

                            

                        </td>
                        <td>
                            <h3 style="font-size: 14pt;"> </h3>
                            <h2>  <br /> </h2>
                        </td>
                    </tr>
                    <tr>
                        <td style="margin-top: 0;" >
                            
                            
                            
                            <br />
                            
                        </td>
                        <td colspan="1" style="margin-top: 0;" >
                        </td>
                        <td colspan="1" style="margin-top: 0;" >
                            
                        </td>
                    </tr>
                    <tr style="height: 7%;" >
                        <td style="width: 130px" >
                            <h4>METER NO</h4>
                        </td>
                        <td style="width: 90px" >
                            <h4>PREVIOUS READING</h4>
                        </td>
                        <td style="width: 90px" >
                            <h4>PRESENT READING</h4>
                        </td>
                        <td style="width: 60px" >
                            <h4>MF</h4>
                        </td>
                        <td style="width: 60px" >
                            <h4>UNITS</h4>
                        </td>
                        <td>
                            <h4>STATUS</h4>
                        </td>
                    </tr>
                    <tr style="height: 30px" >
                        <td >
                            3-P   I 3301539<br> I 3301539<br> E 3301539<br> E 3301539<br>
                        </td>
                        <td >
                            78693<br>16823<br>19740<br>8<br>
                        </td>
                        <td >
                            80086<br>17210<br>20139<br>8<br>
                        </td>
                        <td >
                            1<br>1<br>1<br>1<br>
                        </td>
                        <td >
                            1393<br>387<br>399<br>0<br>
                        </td>
                        <td>
                            
                        </td>
                    </tr>
                    <tr id="roshniMsg" style="height: 30px" >
<td colspan="6">
                            <div style="width: 452pt">
                                <img style="max-width: 100%; max-height: 35%" src="/images/companies/iesco/roshniMsg.jpg"
                                    alt="Roshni Message" />
                            </div>
                        </td>
                     </tr>     
    </table>

From this table I want to extract the paragraph and from there I want to get all the span tags in that paragraph. I used soup.find_all() to get the table but I don't know how to use this function iteratively to pass it back to the original soup object so that I could find the paragraph and, moreover the span tags in that paragraph.

This is the code Python code I wrote:

soup = BeautifulSoup(string, 'html.parser')
#Getting the table tag
results = soup.find_all('table', attrs={'class':'nested4'})
#Getting the paragragh tag 
results = soup.find_all('p', attrs={'style':'margin: 0; text-align: left; padding-left: 5px'})
#Getting all the span tags
results = soup.find_all('span', attrs={})

I just want help on how to get the paragraphs within the table. And then how to get the spans within the paragraph as I am getting the spans in all of the original HTML code. I don't know how to pass the bs4 object list back to the soup object to use soup.find_all iteratively.

CodePudding user response:

from bs4 import BeautifulSoup

html = '''
<table >
                    <tr>
                        <td colspan="1"></td>
                        <td colspan="2">
                            <h2  id="govtMsg" visible="false"></h2>
                        </td>
                        <td colspan="2">
                            <h2 > Net Metering Conn. </h2>
                        </td>
                        <td colspan="2">
                            <h2  hidden> Life Line Consumer</h2>
                        </td>
                    </tr>
                    <tr>
                        <td colspan="2">
                            <p style="margin: 0; text-align: left; padding-left: 5px">
                                <span>NAME & ADDRESS</span>
                                <br />
                                <span>MUHAMMAD AMIN                 </span>
                                <br />
                                <span>S/O MUHAMMAD KHAN             </span>
                                <br />
                                <span>H-NO.38 MARGALLA ROAD         </span>
                                <br />
                                <span>F-6/3 ISLAMABAD3              </span>
                                <br />
                                <span></span>
                                
                                
                            </p>
                        </td>
                        <td colspan="3" style="text-align: left">
                            <h2 >Say No To Corruption</h2>

'''
soup = BeautifulSoup(html, 'html.parser')
spans = soup.select_one('table.nested4').select('span')
for span in spans:
    print(span.text)

This returns:

NAME & ADDRESS
MUHAMMAD AMIN                 
S/O MUHAMMAD KHAN             
H-NO.38 MARGALLA ROAD         
F-6/3 ISLAMABAD3  

 

CodePudding user response:

if you have one table:

soup = BeautifulSoup(string, 'html.parser')
table = soup.find('table', attrs={'class': 'nested4'})
p = table.find('p', attrs={'style': 'margin: 0; text-align: left; padding-left: 5px'})
results = p.find_all('span')
for result in results:
    print(result.get_text(strip=True))

if you have list of tables:

soup = BeautifulSoup(string, 'html.parser')
for table in soup.find_all('table', attrs={'class': 'nested4'}):
    for p in table.find_all('p', attrs={'style': 'margin: 0; text-align: left; padding-left: 5px'}):
        for span in p.find_all('span'):
            print(span.get_text(strip=True))
  • Related