I use python/requests to get the HTML label information, want to store the result into data.frame value_table
, below can't work from value_table.append(name.get_text().strip(),ignore_index=True)
. Anyone can help ? Thanks!
import pandas as pd
import requests
from lxml import etree
from bs4 import BeautifulSoup
value_table=pd.DataFrame(columns=['value'])
url = 'https://www.ebay.com/itm/394079766930'
req = requests.get(url)
soup = BeautifulSoup(req.content, 'lxml')
sku = soup.find('div','u-flL iti-act-num itm-num-txt').get_text(strip=True)
price = soup.find('span',{'itemprop':'price'}).get_text(strip=True)
div = soup.find('div', {'id': 'viTabs_0_is'})
divs = div.findAll('span', {'class': 'ux-textspans'})
for name in divs:
print(name.get_text().strip() ' ')
value_table.append(name.get_text().strip(),ignore_index=True)
value_table['sku',:]=sku
value_table['price':]=price
CodePudding user response:
You're mostly on track. The only real modifications needed for things related to pandas.
Pandas reads a list as a column. But you can use a linear algebra trick to transpose the list which is written as a row but brought in as a column to a row as intended. Because of that, value_table=pd.DataFrame(columns=['value'])
isn't needed. From there it's just a few lines.
So, keep everything above the for-loop with the exception of value_table=pd.DataFrame(columns=['value'])
and replace the for-loop down with this:
value_table=[]
for name in divs:
value_table.append(name.get_text().strip())
values_table = pd.DataFrame(value_table).T
values_table['sku']=skus
values_table['price']=price
That will give you (well, as much can be captured in a screenshot)
For a future iteration, you might want to consider if dict suits your needs better.
EDIT: I noticed in the comments you said: "currently i only want to strore them into one variable as string"
That's a simple as values_table['stringed']=str(value_table)
but it isn't particularly readable, nor easily searchable.