Home > Enterprise >  How to scrape texts in order / merge texts?
How to scrape texts in order / merge texts?

Time:09-06

I'm trying to merge text element in rlg-item__paint class with text element in rlg-trade__itemshas class, like so:

url = "https://rocket-league.com/trade/465ec00f-2f5c-48e2-831e-2e294683ad56"
response = requests.get(f"{url}")
soup = BeautifulSoup(response.text, "html.parser")
for has in soup.findAll('div', attrs={'class': 'rlg-trade__itemshas'}):
    for div in soup.findAll('div', attrs={'class': 'rlg-item-links'}):
        div.extract()
    for color in soup.findAll('div', attrs={'class': 'rlg-item__paint'}):
        color.replaceWith('\n', color)
    items = (has.get_text(f"\n"' ', strip=True))
    print(items)

but it doesn't work, output:

Magma
Pink
Light Show
Cristiano
Anodized Pearl 

Pink text element from rlg-item__paint class, I want to merge it like this:

Magma
Pink Light Show
Cristiano
Anodized Pearl

so I want to merge it in bottom row of text element.

CodePudding user response:

Note: In newer code avoid old syntax findAll() instead use find_all() or select - For more take a minute to check docs


If pattern is always the same you could select your element more specific, extract text with .stripped_strings and slice the <a> texts:

for e in soup.select('.rlg-trade__itemshas .--hover'):
    print(' '.join(list(e.stripped_strings)[:-2]))

or you could use .decompose() to get rid of the links:

for e in soup.select('.rlg-trade__itemshas .--hover'):
    e.select_one('.rlg-item-links').decompose()
    print(e.get_text(strip=True))

Example

from bs4 import BeautifulSoup
import requests

url = "https://rocket-league.com/trade/465ec00f-2f5c-48e2-831e-2e294683ad56"
response = requests.get(f"{url}")
soup = BeautifulSoup(response.text, "html.parser")

for e in soup.select('.rlg-trade__itemshas .--hover'):
    print(' '.join(list(e.stripped_strings)[:-2]))

Output

Magma
Pink Light Show
Cristiano
Anodized Pearl
  • Related