I am using bs4 to scrape a website with a list of years.
years = soup.find_all('td', class_='EndCellSpacer')
which returns an array of matching tags:
[<td >
2014
</td>, <td >
2015
</td>, <td >
2016
</td>, <td >
2017
</td>, <td >
2018
</td>, <td >
2019
</td>, <td >
2020
</td>, <td >
2021
</td>]
I want the array to only return the years without the <td>
tags. I have tried to use
years = soup.find_all('td', class_='EndCellSpacer').text.strip()
but I am getting this error message:
"ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?"
If I call find()
, it only returns the year from the first <td>
tag, and I need all of them.
This might have something to do with the values being in an array but I can't seem to figure it out. I would greatly appreciate the help, this is my first time working in Python :/
CodePudding user response:
If you look at the result of
soup.find_all('td', class_='EndCellSpacer')
it is a list, so you need to iterate over it and get the text of each td
tag:
out = [td.get_text().strip() for td in soup.find_all('td', class_='EndCellSpacer')]
Output:
['2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021']