Home > front end >  Converting a list as a dictionary value to a cell in pandas
Converting a list as a dictionary value to a cell in pandas

Time:01-05

I have a dictionary for a product I have been scraping on this website: https://www.adamhall.com/shop/gi-en/cables-connectors/pre-assembled-cables/microphone-cables/3323/4-star-mmf-1000 I get the image links as a list into a product dictionary, which I want to import into a DataFrame as a cell value in the column images. However, the output makes the data frame have as many rows as there are image links.

Here is my code so far:

from requests_html import HTMLSession
import pandas as pd


url = 'https://www.adamhall.com/shop/gi-en/cables-connectors/pre-assembled-cables/microphone-cables/3323/4-star-mmf-1000'

# product_properties=

def get_product(url):
  s = HTMLSession()
  r = s.get(url)
  
  images = r.html.find('img.js-zoom-image')
  links=[]
  for image in images:
    link = image.attrs['data-zoom']
    links.append(link)

  product = {
    'id': r.html.find('div.right-item', first=True).text.strip(),
    'title': r.html.find('h1.articlename', first=True).text.strip().replace('\n',' '),
    'description':r.html.find('div.description >p', first=True).text.strip(),
    'details': r.html.find('div.js-accordion__content.specification__content', first=True).text.strip(),
    'image':links,
    
    }
  return product

AHdf=pd.DataFrame(get_product(url))

print(AHdf)

Here is what gets returned:

              id  ...                                              image
0  K4MMF1000  ...  https://cdn-shop.adamhall.com/ORIGINAL/media/M...
1  K4MMF1000  ...  https://cdn-shop.adamhall.com/ORIGINAL/media/M...

I would like it to have just one row, with all the image links as a list of items, separated by a comma in one cell in the 'image' column.

CodePudding user response:

Just enclose your function into a list:

#                   v----------------v
AHdf = pd.DataFrame([get_product(url)])
print(AHdf)

# Output
          id                             title                                        description                                            details                                              image
0  K4MMF1000  Adam Hall Cables 4 STAR MMF 1000  Professional, balanced microphone cable practi...  Cable Length\n10 m\nColor\nBlack\nCable diamet...  [https://cdn-shop.adamhall.com/ORIGINAL/media/...

Another way is to use json_normalize:

AHdf = pd.json_normalize(get_product(url))

CodePudding user response:

You can use groupby_agg:

out = AHdf.groupby(AHdf.columns.difference(['image']).tolist())['image'].agg(list).reset_index()

Output:

            description                               details                                            id          title                               image  
0  Professional, balanced microphone cable practi...  Cable Length\n10 m\nColor\nBlack\nCable diamet...  K4MMF1000   Adam Hall Cables 4 STAR MMF 1000    [https://cdn-shop.adamhall.com/ORIGINAL/media/...  
  •  Tags:  
  • Related