Home > Net >  BeautifulSoup in python get multi tr tag from multi div tag?
BeautifulSoup in python get multi tr tag from multi div tag?

Time:04-09

BeautifulSoup in python get a multi-tr tag from the multi-div tag?

this is the code HTML :

<div id="compare">
  <div >
    <tbody>
      <tr>
        <td >
          <a href="/overview">sami</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first1</td>
        <td >second1</td>
        <td >third1</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">fadi</a></td>
        <td  rid="number1">10</td>
        <td  rid="number">36</td>
        <td >first2</td>
        <td >second2</td>
        <td >third2</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">achraf</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first3</td>
        <td >second3</td>
        <td >third3</td>
      </tr>

  <div >
     <tbody>
      <tr>
        <td >
          <a href="/overview">john</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first1</td>
        <td >second1</td>
        <td >third1</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">noor</a></td>
        <td  rid="number1">10</td>
        <td  rid="number">36</td>
        <td >first2</td>
        <td >second2</td>
        <td >third2</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">dadi</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first3</td>
        <td >second3</td>
        <td >third3</td>
      </tr>
  <div >
     <tbody>
      <tr>
        <td >
          <a href="/overview">ham</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first1</td>
        <td >second1</td>
        <td >third1</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">fathe</a></td>
        <td  rid="number1">10</td>
        <td  rid="number">36</td>
        <td >first2</td>
        <td >second2</td>
        <td >third2</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">kali</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first3</td>
        <td >second3</td>
        <td >third3</td>
      </tr>

I try to get value from every <td but just give me the first <td from each <div >

and I can't get <td > because the tag is duplicate.

this is my code python:

contents = BeautifulSoup(response_url.content, "lxml")

table_body = contents.find('div', {'id': 'compare'})
print(len(table_body))


rowss = table_body.find_all('div', {'class': 'students'})
print(len(rowss))


for section in rowss:

  print('--------------------------')
  try:
    name = section.find_next('td', {'class': 'argaam-font company-short-name'}).text
    print(name)
    name1 = section.find_next('td', {'class': 'center', 'rid': 'number1'}).text
    print(name1)
    name2 = section.find_next('td', {'rid': 'number'}).text
    print(name2)
    name3 = section.find_next('td', {'class': 'center'}).text
    print(name3)
    name4 = section.find_next('td', {'class': 'center'}).text
    print(name4)

    name15 = section.find_next('td', {'class': 'center'}).text
    print(name5)

  except:
    name = ''
    print('error')

Resutl:

--------------------------
sami
32
20
first1
second1
third1
--------------------------
john
32
20
first1
second1
third1
--------------------------
ham
32
20
first1
second1
third1

but I want results like that:

--------------------------
sami
32
20
first1
second1
third1
fadi
10
36
first2
second2
third2
achraf
32
20
first3
second3
third3
--------------------------
john
32
20
first1
second1
third1
noor
10
36
first2
second2
third2
dadi
32
20
first3
second3
third3
--------------------------
ham
32
20
first1
second1
third1
fathe
10
36
first2
second2
third2
kali
32
20
first3
second3
third3

I using python to scraping data by BeautifulSoup

CodePudding user response:

I hope,You can do that using css selector and stripped_strings

html_doc='''
<div id="compare">
  <div >
    <tbody>
      <tr>
        <td >
          <a href="/overview">sami</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first1</td>
        <td >second1</td>
        <td >third1</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">fadi</a></td>
        <td  rid="number1">10</td>
        <td  rid="number">36</td>
        <td >first2</td>
        <td >second2</td>
        <td >third2</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">achraf</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first3</td>
        <td >second3</td>
        <td >third3</td>
      </tr>

  <div >
     <tbody>
      <tr>
        <td >
          <a href="/overview">john</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first1</td>
        <td >second1</td>
        <td >third1</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">noor</a></td>
        <td  rid="number1">10</td>
        <td  rid="number">36</td>
        <td >first2</td>
        <td >second2</td>
        <td >third2</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">dadi</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first3</td>
        <td >second3</td>
        <td >third3</td>
      </tr>
  <div >
     <tbody>
      <tr>
        <td >
          <a href="/overview">ham</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first1</td>
        <td >second1</td>
        <td >third1</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">fathe</a></td>
        <td  rid="number1">10</td>
        <td  rid="number">36</td>
        <td >first2</td>
        <td >second2</td>
        <td >third2</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">kali</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first3</td>
        <td >second3</td>
        <td >third3</td>
      </tr>
'''


from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc,'html.parser')
for tr in soup.select('.students tbody tr'):
    print(list(tr.stripped_strings))

Output:

['sami', '32', '20', 'first1', 'second1', 'third1']
['fadi', '10', '36', 'first2', 'second2', 'third2']
['achraf', '32', '20', 'first3', 'second3', 'third3']
['john', '32', '20', 'first1', 'second1', 'third1']
['noor', '10', '36', 'first2', 'second2', 'third2']
['dadi', '32', '20', 'first3', 'second3', 'third3']
['ham', '32', '20', 'first1', 'second1', 'third1']
['fathe', '10', '36', 'first2', 'second2', 'third2']
['kali', '32', '20', 'first3', 'second3', 'third3']

CodePudding user response:

Main issue here is the ill-fated nesting - So to get your expected output, you have to select your elements more specific. One possible strategy is to select the first direct <tr> and so only its direct <tr> siblings:

for d in soup.select('div.students tbody > tr:nth-of-type(1)'):
    print('---------------------------')
    l = []
    l.append(list(d.stripped_strings))
    l.extend([list(n.stripped_strings) for n in d.find_next_siblings('tr')])
    print(*[e for lst in l for e in lst], sep='\n')
Example
html = '''
<div id="compare">
  <div >
    <tbody>
      <tr>
        <td >
          <a href="/overview">sami</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first1</td>
        <td >second1</td>
        <td >third1</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">fadi</a></td>
        <td  rid="number1">10</td>
        <td  rid="number">36</td>
        <td >first2</td>
        <td >second2</td>
        <td >third2</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">achraf</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first3</td>
        <td >second3</td>
        <td >third3</td>
      </tr>

  <div >
     <tbody>
      <tr>
        <td >
          <a href="/overview">john</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first1</td>
        <td >second1</td>
        <td >third1</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">noor</a></td>
        <td  rid="number1">10</td>
        <td  rid="number">36</td>
        <td >first2</td>
        <td >second2</td>
        <td >third2</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">dadi</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first3</td>
        <td >second3</td>
        <td >third3</td>
      </tr>
  <div >
     <tbody>
      <tr>
        <td >
          <a href="/overview">ham</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first1</td>
        <td >second1</td>
        <td >third1</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">fathe</a></td>
        <td  rid="number1">10</td>
        <td  rid="number">36</td>
        <td >first2</td>
        <td >second2</td>
        <td >third2</td>
      </tr>
      <tr>
        <td >
          <a href="/overview">kali</a></td>
        <td  rid="number1">32</td>
        <td  rid="number">20</td>
        <td >first3</td>
        <td >second3</td>
        <td >third3</td>
      </tr>
'''


from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
  
for d in soup.select('div.students tbody > tr:nth-of-type(1)'):
    print('---------------------------')
    l = []
    l.append(list(d.stripped_strings))
    l.extend([list(n.stripped_strings) for n in d.find_next_siblings('tr')])
    print(*[e for lst in l for e in lst], sep='\n')
Output
---------------------------
sami
32
20
first1
second1
third1
fadi
10
36
first2
second2
third2
achraf
32
20
first3
second3
third3
---------------------------
john
32
20
first1
second1
third1
noor
10
36
first2
second2
third2
dadi
32
20
first3
second3
third3
---------------------------
ham
32
20
first1
second1
third1
fathe
10
36
first2
second2
third2
kali
32
20
first3
second3
third3
  • Related