Home > Software engineering >  I'm unsure how to print the rest of the information i need from HTML
I'm unsure how to print the rest of the information i need from HTML

Time:06-16

import requests
from bs4 import BeautifulSoup
from datetime import datetime
from dateutil.relativedelta import relativedelta

evr_begin = datetime.now().strftime("%m/%d/%Y")
evr_end = (datetime.now()   relativedelta(months=1)).strftime("%m/%d/%Y")
url = "https://mms.kcbs.us/members/evr_search_ol_json.php?" \
      f"otype=TEXT&evr_map_type=2&org_id=KCBA&evr_begin={evr_begin}&evr_end=. 
      {evr_end}&" \
      "evr_radius=50&evr_type=269&evr_region_type=1"
response = requests.request("GET", url)
soup = BeautifulSoup(response.text, features='lxml')
for event in soup.find_all('div', class_='row'):
    print(event.find('b').getText())
    print(event.find('i').getText())

Link to website https://mms.kcbs.us/members/evr_search.php?org_id=KCBA

I'm unsure on how to print what comes after the information I'm already printing. Part of the issue is some of the other texts share the same tag, while others I'm just unsure.

For Example for the first event Im needing to print

Frisco, CO 80443 UNITED STATES STATE CHAMPIONSHIP Reps: BUNNY TUTTLE, RICH TUTTLE, MICHAEL WINTER Prize Money: $13,050.00

all separately.
If i use print(event.find('div', class_='col-md-4').getText()) within the for loop it will print it clumped together

CodePudding user response:

What I would do is create a dictionary containing all the names for the different pieces of data mapped to the order in which they appear in each row of the table. Then collect each row into it's own dictionary and append them to a list for you to deal with once it's all finished parsing.

For Example:

import requests
from bs4 import BeautifulSoup
from datetime import datetime
from dateutil.relativedelta import relativedelta
import json

data = {
    0:{ 0:"title", 1:"dates", 2:"city/state", 3:"country" },
    1:{ 0:"event", 1:"reps", 2:"prize" },
    2:{ 0:"results" }
}

evr_begin = datetime.now().strftime("%m/%d/%Y")
evr_end = (datetime.now()   relativedelta(months=1)).strftime("%m/%d/%Y")
url = f"https://mms.kcbs.us/members/evr_search_ol_json.php?otype=TEXT&evr_map_type=2&org_id=KCBA&evr_begin={evr_begin}&evr_end=.{evr_end}&evr_radius=50&evr_type=269&evr_region_type=1"
response = requests.request("GET", url)
print(response.content)
soup = BeautifulSoup(response.text, features='lxml')
all_data = []
for element in soup.find_all('div', class_="row"):
    event = {}
    for i, col in enumerate(element.find_all('div', class_='col-md-4')):
        for j, item in enumerate(col.strings):
            event[data[i][j]] = item
    all_data.append(event)

print(json.dumps(all_data,indent=4))

The output would look something like this:

 {
        "title": "Frisco BBQ Challenge",
        "dates": "6/16/2022 - 6/18/2022",
        "city/state": "Frisco, CO 80443",
        "country": "UNITED STATES",
        "event": "STATE CHAMPIONSHIP",
        "reps": "Reps: BUNNY TUTTLE, RICH TUTTLE, MICHAEL WINTER",
        "prize": "Prize Money: $13,050.00",
        "results": "Results Not In"
    },
    {
        "title": "York County BBQ Festival",
        "dates": "6/17/2022 - 6/18/2022",
        "city/state": "Delta, PA 17314",
        "country": "UNITED STATES",
        "event": "STATE CHAMPIONSHIP",
        "reps": "Reps: ANGELA MCKEE, ROBERT MCKEE, LOUISE WEIDNER",
        "prize": "Prize Money: $5,500.00",
        "results": "Results Not In"
    },
...
  • Related