I'm trying to merge text element in rlg-item__paint
class with text element in rlg-trade__itemshas
class, like so:
url = "https://rocket-league.com/trade/465ec00f-2f5c-48e2-831e-2e294683ad56"
response = requests.get(f"{url}")
soup = BeautifulSoup(response.text, "html.parser")
for has in soup.findAll('div', attrs={'class': 'rlg-trade__itemshas'}):
for div in soup.findAll('div', attrs={'class': 'rlg-item-links'}):
div.extract()
for color in soup.findAll('div', attrs={'class': 'rlg-item__paint'}):
color.replaceWith('\n', color)
items = (has.get_text(f"\n"' ', strip=True))
print(items)
but it doesn't work, output:
Magma
Pink
Light Show
Cristiano
Anodized Pearl
Pink
text element from rlg-item__paint
class, I want to merge it like this:
Magma
Pink Light Show
Cristiano
Anodized Pearl
so I want to merge it in bottom row of text element.
CodePudding user response:
Note: In newer code avoid old syntax findAll()
instead use find_all()
or select
- For more take a minute to check docs
If pattern is always the same you could select your element more specific, extract text with .stripped_strings
and slice the <a>
texts:
for e in soup.select('.rlg-trade__itemshas .--hover'):
print(' '.join(list(e.stripped_strings)[:-2]))
or you could use .decompose()
to get rid of the links:
for e in soup.select('.rlg-trade__itemshas .--hover'):
e.select_one('.rlg-item-links').decompose()
print(e.get_text(strip=True))
Example
from bs4 import BeautifulSoup
import requests
url = "https://rocket-league.com/trade/465ec00f-2f5c-48e2-831e-2e294683ad56"
response = requests.get(f"{url}")
soup = BeautifulSoup(response.text, "html.parser")
for e in soup.select('.rlg-trade__itemshas .--hover'):
print(' '.join(list(e.stripped_strings)[:-2]))
Output
Magma
Pink Light Show
Cristiano
Anodized Pearl