Home > Net >  Beautifulsoup4 find and list children with and by name
Beautifulsoup4 find and list children with and by name

Time:09-25

So I have this XML file mockup:

xml="""
<fruits>
  <fruit>
    <name>apple</name>
    <types>
      <type>
        <color>red</color>
        <taste>sweet</taste>
        <size>big</size>
        <description>Nice, round, sweet red apple</description>
      </type>
      <type>
        <color>green</color>
        <taste>sour</taste>
        <size>medium</size>
        <description>Small, sour, green apple</description>
      </type>
    </types>
  </fruit>
  <fruit>
    <name>Banana</name>
    <types>
      <type>
        <color>yellow</color>
        <taste>sweet</taste>
        <size>small</size>
        <description>Good for banana-smoothies only</description>
      </type>
      <type>
        <color>green</color>
        <taste>Bitter</taste>
        <size>big</size>
        <description>Not quite ripe yet</description>
      </type>
    </types>
  </fruit>
</fruits>
"""
#</editor-fold>

And I'm trying to use this code:

from bs4 import BeautifulSoup
soup=BeautifulSoup(xml, 'lxml')

fruits=soup.findAll("fruit", recursive=False)
print(fruits)

type=soup.findAll("type")

list=[]

name=soup.findAll("name")

for nameid in range(len(name)):
    list =name[nameid]

    for id in range(len(type)):
        list =(soup.findAll("color")[id].string)
        list =(soup.findAll("taste")[id].string)
        list =(soup.findAll("size")[id].string)
        list =(soup.findAll("description")[id].string)
            list =("""</tr>""")
        #list.append("<td>" soup.findAll("description")[id].string "</td>")
        #list.append("</tr>")
        if list:
            list="".join(list)

I can't manage to find a way to list the properties('s kids), with the name in a table. Everything I tried so far ended up displaying the name but when it hits banana, it either displays both properties of just the apple or the properties of both the apple and the banana.

I'm just using for loops in python with BeautifulSoup lxml. Any help is appreciated!

CodePudding user response:

The code below will collect all info from the xml into a data structure that "make sense".

The code does not use any external library - just core python xml library.

import xml.etree.ElementTree as ET
from collections import defaultdict

xml = """
<fruits>
  <fruit>
    <name>apple</name>
    <types>
      <type>
        <color>red</color>
        <taste>sweet</taste>
        <size>big</size>
        <description>Nice, round, sweet red apple</description>
      </type>
      <type>
        <color>green</color>
        <taste>sour</taste>
        <size>medium</size>
        <description>Small, sour, green apple</description>
      </type>
    </types>
  </fruit>
  <fruit>
    <name>Banana</name>
    <types>
      <type>
        <color>yellow</color>
        <taste>sweet</taste>
        <size>small</size>
        <description>Good for banana-smoothies only</description>
      </type>
      <type>
        <color>green</color>
        <taste>Bitter</taste>
        <size>big</size>
        <description>Not quite ripe yet</description>
      </type>
    </types>
  </fruit>
</fruits>
"""
data = defaultdict(list)
root = ET.fromstring(xml)
for fruit in root.findall('.//fruit'):
    name = fruit.find('name').text
    for _type in fruit.findall('.//type'):
        data[name].append({x.tag: x.text for x in list(_type)})
for fruit, types in data.items():
    print(f'{fruit} -> {types}')

output

apple -> [{'color': 'red', 'taste': 'sweet', 'size': 'big', 'description': 'Nice, round, sweet red apple'}, {'color': 'green', 'taste': 'sour', 'size': 'medium', 'description': 'Small, sour, green apple'}]
Banana -> [{'color': 'yellow', 'taste': 'sweet', 'size': 'small', 'description': 'Good for banana-smoothies only'}, {'color': 'green', 'taste': 'Bitter', 'size': 'big', 'description': 'Not quite ripe yet'}]
  • Related