Hello everyone I'm scraping a table and separating the headers and the body of the table into separate lists but the body data has a lot of '/n' and I'm trying to remove them but I cant seem to get them out.
code:
soup = BeautifulSoup(driver.page_source,'html.parser')
table= soup.find("table")
rows= table.find_all("tr")
table_contents = []
for tr in rows:
if rows.index(tr)== 0:
row_cells = [ th.getText().strip() for th in tr.find_all('th') if th.getText().strip() !='']
else:
row_cells = ([ tr.find('th').getText() ] if tr.find('th') else [] ) [ td.getText().strip() for td in tr.find_all('td') if td.getText().strip() != '' ]
if len(row_cells) > 1 :
table_contents = [ row_cells ]
table_head= table_contents[0]
table_body= table_contents[1]
print (table_head)
print (table_body)
Results:
table head= ['Student Number', 'Student Name', 'Placement Date']
table body= ['20808456', 'Sandy\n(f) \nGurlow', '01/13/2023']
As you can see in the table body results '\n' is in the way and I can figure out how to get rid of it. As I have 100's of samples to pull with the same issue.
CodePudding user response:
Using str.replace()
and list comprehension:
[i.replace('\n', '') for i in table_body]
Output:
['20808456', 'Sandy(f) Gurlow', '01/13/2023']