Home > Software design >  IndentationError: expected an indented block in list
IndentationError: expected an indented block in list

Time:09-04

from cgitb import text
from bs4 import BeautifulSoup
import requests

website = 'https://www.marketplacehomes.com/rent-a-home/'
result = requests.get(website)
content = result.text

soup = BeautifulSoup(content, 'html.parser')
lists = soup.find_all('div', class_=('tt-rental-row'))

for list in lists:
    location = list.find('span', class_="renta;-adress")
    beds = list.find('span', class_="renta;-beds")
    baths = list.find('span', class_="renta;-beds")
    availability = list.find('span', class_="rental-date-available")
    info = [location, beds, baths, availability]
    print(info)

If I try to run the last line of code, I get:

"IndentationError: expected an indented block"

If I try to run each indentation separately I get:

">>> location = list.find('span', class_="renta;-adress")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'list' has no attribute 'find'"

I'm new to Python and I'm kinda stuck, can anyone please help me?

CodePudding user response:

Since the word list is a built-in keyword in python you can't use it as variable name try another name

for myList in lists:
    location = myList.find('span', class_="renta;-adress")
    beds = myList.find('span', class_="renta;-beds")
    baths = myList.find('span', class_="renta;-beds")
    availability = myList.find('span', class_="rental-date-available")
    info = [location, beds, baths, availability]
    print(info)

CodePudding user response:

It's not possible to extract data from HTML DOM using bs4 because data is dynamically populating via API.

Example:

import requests
import pandas as pd
api_url = 'https://app.tenantturner.com/listings-json/2679'
req=requests.get(api_url).json()

info=[]
for item in req:
    location = item['address']
    beds = item['beds']
    baths = item['baths']
    availability = item['dateAvailable']
    info.append({
        'location':location, 
        'beds':beds,
        'baths':baths,
        'availability':availability})
df= pd.DataFrame(info) 
print(df)

Output:

            location      beds baths availability
0     4481 Jack Faulk St    4     2          Now
1    213 Skybranch Court    3   2.5          Now
2    8127 Olive Brook Dr    3   2.0          Now
3          735 Grace Ave    3   2.0          Now
4         1810 E 41st St    3   1.0          Now
..                   ...  ...   ...          ...
86  447 Union Station St    2   2.5          Now
87  5819 Fairdale Lane C    3   3.5          Now
88      10020 Braxton Dr    4     3          Now
89        709 Hilchot Dr    3   2.5          Now
90        5042 Pike Loop    3   2.5          Now

[91 rows x 4 columns]

CodePudding user response:

As also mentioned by @F.Hoque data is requestet from another ressource - You could also use pandas to directly create a DataFrame and slice it to your needs:

import pandas as pd
pd.read_json('https://app.tenantturner.com/listings-json/2679')

Output:

    id  dateActivated   latitude    longitude   address city    state   zip photo   title   ... baths   dateAvailable   rentAmount  acceptPets  applyUrl    btnUrl  btnText virtualTour propertyType    enableWaitlist
0   83600   8/22/2022   35.750499   -86.393972  4481 Jack Faulk St  Murfreesboro    TN  37127   https://ttimages.blob.core.windows.net/propert...   4481 Jack Faulk St  ... 2.0 Now 2195    cats, small dogs, large dogs    https://app.propertyware.com/pw/application/#/...   https://app.tenantturner.com/qualify/4481-jack...   Schedule Viewing    None    Single Family   False
1   100422  8/31/2022   30.277607   -95.472842  213 Skybranch Court Conroe  TX  77304   https://ttimages.blob.core.windows.net/propert...   213 Skybranch Court ... 2.5 Now 2100    cats, small dogs, large dogs    https://app.propertyware.com/pw/application/#/...   https://app.tenantturner.com/qualify/213-skybr...   Schedule Viewing    None    Condo Unit  False
2   106976  7/27/2022   28.274720   -82.298077  8127 Olive Brook Dr Wesley Chapel   FL  33545   https://ttimages.blob.core.windows.net/propert...   8127 Olive Brook Dr ... 2.0 Now 2650    no pets https://app.propertyware.com/pw/application/#/...   https://app.tenantturner.com/qualify/8127-oliv...   Schedule Viewing    None    Single Family   False
3   116188  8/15/2022   42.624023   -83.144614  735 Grace Ave   Rochester Hills MI  48307   https://ttimages.blob.core.windows.net/propert...   735 Grace Ave   ... 2.0 Now 1600    cats, small dogs, large dogs    https://app.propertyware.com/pw/application/#/...   https://app.tenantturner.com/qualify/735-grace...   Schedule Viewing    None    Single Family   False
4   126846  8/22/2022   32.046455   -81.071181  1810 E 41st St  Savannah    GA  31404   https://ttimages.blob.core.windows.net/propert...   1810 E 41st St  ... 1.0 Now 1395    small dogs  https://app.propertyware.com/pw/application/#/...   https://app.tenantturner.com/qualify/1810-e-41...   Schedule Viewing    None    Single Family   True

...
91 rows × 22 columns

Example:

To show only specifc columns, simply pass a list of there names.

import pandas as pd
pd.read_json('https://app.tenantturner.com/listings-json/2679')[['address', 'city','state', 'zip', 'title', 'beds', 'baths','dateAvailable']]

Output

    address beds    baths   dateAvailable
0   4481 Jack Faulk St  4   2.0 Now
1   213 Skybranch Court 3   2.5 Now
2   8127 Olive Brook Dr 3   2.0 Now
3   735 Grace Ave   3   2.0 Now
4   1810 E 41st St  3   1.0 Now
... ... ... ... ...

91 rows × 4 columns

Note: Your code worked as expected, but never runs the for-loop cause your selection never matches the elements in HTML. They are generated dynamically based on data from another ressource and requests do not render websites like a browser, it only uses static contents from response.

Also be aware not to use built-in keywords they will cause errors in your code, so check this keyword list

  • Related