Home > Back-end >  Extract elements between two tags with Beautiful Soup and Python
Extract elements between two tags with Beautiful Soup and Python

Time:12-17

I want to crawl this website http://www.truellikon.ch/freizeit-kultur/anlaesse-agenda.html . I want to extract date and time of each event. You can see that date is listed above events. In order to extract date and time I need to combine different divs, but the problem is that I do not have 'container' for group of events that are on the same date. So the only thing that I can do is to extract all events that are between two divs that refer to date.

This is the code for extracting the event info:

from bs4 import BeautifulSoup
import requests

domain = 'truellikon.ch'
url = 'http://www.truellikon.ch/freizeit-kultur/anlaesse-agenda.html'

def get_website_news_links_truellikonCh():
    
    response = requests.get(url, allow_redirects=True)
    print("Response for", url, response)

    soup = BeautifulSoup(response.content, 'html.parser')
    all_events = soup.select('div.eventItem')

    for i in all_events:
        print(i)
        print()
        input()
    
x = get_website_news_links_truellikonCh()

Class name for date is 'listThumbnailMonthName' My question is how can I combine these divs, how can I write the selectors so that I can get exact date and time, title and body of each event

CodePudding user response:

you have one parent container which is #tx_nezzoagenda_list and then you have to read the children one by one

import re

from bs4 import BeautifulSoup
import requests

url = 'http://www.truellikon.ch/freizeit-kultur/anlaesse-agenda.html'
response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
container = soup.select_one('#tx_nezzoagenda_list')

for child in container.children:
    if not child.name:
        continue
    if 'listThumbnailMonthName' in child.get('class'):
        base_date=child.text.strip()
    else:
        day=child.select_one('.dateDayNumber').text.strip()
        title=child.select_one('.titleText').text.strip()
        locationDate=child.select_one('.locationDateText').children
        time=list(locationDate)[-1].strip()
        time=re.sub('\s','', time)
        print(title, day, base_date, time)

which outputs

Abendunterhaltung TV Trüllikon 10 Dezember 2021 19:00Uhr-3:00Uhr
Christbaum-Verkauf 18 Dezember 2021 9:30Uhr-11:00Uhr
Silvester Party 31 Dezember 2021 22:00Uhr
Neujahrsapéro 02 Januar 2022 16:00Uhr-18:00Uhr
Senioren-Zmittag 21 Januar 2022 12:00Uhr-15:00Uhr
Theatergruppe "Nume Hüür", Aufführung 23 Januar 2022 13:00Uhr-16:00Uhr
Elektroschrottsammlung 29 Januar 2022 9:00Uhr-12:00Uhr
Senioren Z'mittag 18 Februar 2022 12:00Uhr-15:00Uhr
Frühlingskonzert 10 April 2022 12:17Uhr
Weinländer Musiktag 22 Mai 2022 8:00Uhr
Auffahrtskonzert Altersheim 26 Mai 2022 10:30Uhr
Feierabendmusik und Jubilarenehrung 01 Juli 2022 19:00Uhr
Feierabendmusik 15 Juli 2022 12:24Uhr
Feierabendmusik 19 August 2022 19:00Uhr
Herbstanlass 19 November 2022 20:00Uhr
  • Related