Home > Software engineering >  Python: Changing bs4.element.ResultSet elements in list of lists to text
Python: Changing bs4.element.ResultSet elements in list of lists to text

Time:03-24

Hi everyone I have extracted some html elements from a webiste using beautifulsoup and find_all. Therefore I have received a list of list of bs4.elements.ResultSet like this:

[[<li >neu</li>],
 [<li >neu</li>],
 [<li >neu</li>, <li >Terrasse</li>],
 [<li >neu</li>,
  <li >Terrasse</li>,
  <li >Parkplatz</li>]

I would now like to retrieve the text within the bs4 elements and keep the same format of list. I have been experimenting with creating two loops.

fet = []
for feat in features_bs:
    for fets in feat:
        fet.append(fets.text)
    features.append(fet)

The first loop looks at every list (feat) within the original list (features_bs). The second looks at every elements (fets) in every inside list (feats) and then changes the element to text. I would now have liked to append the text back into an empty list (fet), however I would like to keep the same format as before with lists inside lists. At the moment I only get the text inside the first loop like this:

['neu',
 'neu',
 'neu',
'Terrasse',
 'neu',
'Terrasse',
 'Parkplatz']

However I would like the output to be:

[['neu'],
['neu'],
['neu','Terrase'],
['neu'],
['Terrase']
['Parkplatz']]

Thanks for the help in advance.

CodePudding user response:

Near to your goal - but there is one temporary list missing:

fet = []
for feat in features_bs:
    el = []
    for fets in feat:
        el.append(fets.text)
    fet.append(el)
fet

Output:

[['neu'], ['neu'], ['neu', 'Terrasse'], ['neu'], ['Terrasse'], ['Parkplatz']]

You could also lean your process and transform it directly into your expected format:

from bs4 import BeautifulSoup

html = '''
<ul>
<li >neu</li>
</ul>
<ul>
<li >neu</li>
</ul>
<ul>
<li >neu</li>, <li >Terrasse</li>
</ul>
<ul>
<li >neu</li>
</ul>
<ul>
<li >Terrasse</li>
</ul>
<ul>
<li >Parkplatz</li>
</ul>
'''

soup = BeautifulSoup(html)
data = []
for ul in soup.find_all('ul'):
    el = []
    for e in ul.find_all('li'):
        el.append(e)
    data.append(el)
data

Output:

[['neu'], ['neu'], ['neu', 'Terrasse'], ['neu'], ['Terrasse'], ['Parkplatz']]
  • Related