I have a dictionary for a product I have been scraping on this website: https://www.adamhall.com/shop/gi-en/cables-connectors/pre-assembled-cables/microphone-cables/3323/4-star-mmf-1000 I get the image links as a list into a product dictionary, which I want to import into a DataFrame as a cell value in the column images. However, the output makes the data frame have as many rows as there are image links.
Here is my code so far:
from requests_html import HTMLSession
import pandas as pd
url = 'https://www.adamhall.com/shop/gi-en/cables-connectors/pre-assembled-cables/microphone-cables/3323/4-star-mmf-1000'
# product_properties=
def get_product(url):
s = HTMLSession()
r = s.get(url)
images = r.html.find('img.js-zoom-image')
links=[]
for image in images:
link = image.attrs['data-zoom']
links.append(link)
product = {
'id': r.html.find('div.right-item', first=True).text.strip(),
'title': r.html.find('h1.articlename', first=True).text.strip().replace('\n',' '),
'description':r.html.find('div.description >p', first=True).text.strip(),
'details': r.html.find('div.js-accordion__content.specification__content', first=True).text.strip(),
'image':links,
}
return product
AHdf=pd.DataFrame(get_product(url))
print(AHdf)
Here is what gets returned:
id ... image
0 K4MMF1000 ... https://cdn-shop.adamhall.com/ORIGINAL/media/M...
1 K4MMF1000 ... https://cdn-shop.adamhall.com/ORIGINAL/media/M...
I would like it to have just one row, with all the image links as a list of items, separated by a comma in one cell in the 'image' column.
CodePudding user response:
Just enclose your function into a list:
# v----------------v
AHdf = pd.DataFrame([get_product(url)])
print(AHdf)
# Output
id title description details image
0 K4MMF1000 Adam Hall Cables 4 STAR MMF 1000 Professional, balanced microphone cable practi... Cable Length\n10 m\nColor\nBlack\nCable diamet... [https://cdn-shop.adamhall.com/ORIGINAL/media/...
Another way is to use json_normalize
:
AHdf = pd.json_normalize(get_product(url))
CodePudding user response:
You can use groupby_agg
:
out = AHdf.groupby(AHdf.columns.difference(['image']).tolist())['image'].agg(list).reset_index()
Output:
description details id title image
0 Professional, balanced microphone cable practi... Cable Length\n10 m\nColor\nBlack\nCable diamet... K4MMF1000 Adam Hall Cables 4 STAR MMF 1000 [https://cdn-shop.adamhall.com/ORIGINAL/media/...