I use python/requests to get the HTML label information, want to store the result into pandas data.f-CodePudding

I use python/requests to get the HTML label information, want to store the result into data.frame value_table, below can't work from value_table.append(name.get_text().strip(),ignore_index=True). Anyone can help ? Thanks!

import pandas as pd
import requests
from lxml import etree
from bs4 import BeautifulSoup
value_table=pd.DataFrame(columns=['value'])
url = 'https://www.ebay.com/itm/394079766930'
req = requests.get(url)
soup = BeautifulSoup(req.content, 'lxml')

sku = soup.find('div','u-flL iti-act-num itm-num-txt').get_text(strip=True)
price = soup.find('span',{'itemprop':'price'}).get_text(strip=True)

div = soup.find('div', {'id': 'viTabs_0_is'})
divs = div.findAll('span', {'class': 'ux-textspans'})
for name in divs:
    print(name.get_text().strip() ' ')
    value_table.append(name.get_text().strip(),ignore_index=True)

value_table['sku',:]=sku
value_table['price':]=price

CodePudding user response：

You're mostly on track. The only real modifications needed for things related to pandas.

Pandas reads a list as a column. But you can use a linear algebra trick to transpose the list which is written as a row but brought in as a column to a row as intended. Because of that, value_table=pd.DataFrame(columns=['value']) isn't needed. From there it's just a few lines.

So, keep everything above the for-loop with the exception of value_table=pd.DataFrame(columns=['value']) and replace the for-loop down with this:

value_table=[]
for name in divs:
    value_table.append(name.get_text().strip())

values_table = pd.DataFrame(value_table).T
values_table['sku']=skus
values_table['price']=price

That will give you (well, as much can be captured in a screenshot)

For a future iteration, you might want to consider if dict suits your needs better.

EDIT: I noticed in the comments you said: "currently i only want to strore them into one variable as string"

That's a simple as values_table['stringed']=str(value_table) but it isn't particularly readable, nor easily searchable.