Home > OS >  Python data formatting rows to columns
Python data formatting rows to columns

Time:10-28

I am trying to scraping the web from there I am getting all the data and I am trying out different formats to represent data. Here is the code:

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = "https://99petshops.com.au/Search?brandName=Ziwi Peak&animalCode=DOG&storeId=89/&page=1"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

all_info = []
for item in soup.select(".pd-info"):
    title = item.h2.get_text(strip=True)
    price = item.select_one('span:-soup-contains("Price")').span.text
    try:
        store_name = item.select_one('span.hilighted').find_next('img')['alt']
        shipping = item.select_one('span.shipping').text.strip()
        price_per_100_g = item.select_one('p.unit-price').text.strip()
    except:
        store_name = ''
        shipping = ''
        price_per_100_g = ''
    d = {"Title": title, "Price": price, "Store": store_name, "Shipping": shipping, "Price_Per_100_g": price_per_100_g}
    
    for i, p in enumerate(item.select(".sp-price"), 1):
        try:
            store_name = p.find_next("img")["alt"]
        except:
            store_name = ''
        d[f"store_{i:>02}"] = store_name
    
    for i, p in enumerate(item.select(".sp-price"), 1):
        d[f"price_{i:>02}"] = p.get_text(strip=True)
    
    for i, p in enumerate(item.select(".shipping"), 1):
        d[f"shipping_{i:>02}"] = p.get_text(strip=True)

    all_info.append(d)

df = pd.DataFrame(all_info).fillna("")
print(df.head())

df.to_csv("data_2.csv", index=False)

and I have a csv data set from the above scraper consisting of 30 columns and 31 rows Here is a small example:

Title,Price,Store,Shipping,Price_Per_100_g,store_01,store_02,store_03,store_04,store_05,store_06,store_07,store_08,store_09,store_10,store_11,store_12,store_13,store_14,store_15,store_16,store_17,store_18,store_19,store_20,store_21,store_22,store_23,store_24,store_25,price_01,price_02,price_03,price_04,price_05,price_06,price_07,price_08,price_09,price_10,price_11,price_12,price_13,price_14,price_15,price_16,price_17,price_18,price_19,price_20,price_21,price_22,price_23,price_24,price_25,shipping_01,shipping_02,shipping_03,shipping_04,shipping_05,shipping_06,shipping_07,shipping_08,shipping_09,shipping_10,shipping_11,shipping_12,shipping_13,shipping_14,shipping_15,shipping_16,shipping_17,shipping_18,shipping_19,shipping_20,shipping_21,shipping_22,shipping_23,shipping_24,shipping_25,shipping_26,store_26,price_26,shipping_27,store_27,store_28,price_27,price_28,shipping_28,shipping_29,store_29,price_29,shipping_30
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$57.75,Woofers World, $7.95 shipping,$5.78 per 100g,VetShopAustralia,Vet Products Direct,Pet Shop Direct,Petso,PetPost,Pet Chemist,Your PetPA,Pet Circle,Petbarn,World for Pets,Lucky Pet,Stefmar,Budget Pet Products,Best Friends Pets,Kellyville Pets,iPetStore,My Pet Warehouse,Pet City,Pets Unleashed,PetO,PETstock,Pet House,Habitat Pets,Pet Culture,Peticular,$64.60,$64.60,$64.95,$64.95,$64.99,$66.29,$67.32,$69.69,$69.69,$69.95,$70.19,$71.99,$72.99,$73.39,$75.15,$75.95,$76.99,$77.99,$77.99,$79.99,$81.54,$81.99,$83.49,$83.49,$83.50, $9.95 shipping, $7.95 shipping, $7.95 shipping, $4.95 to $9.95 shipping, $10.00 to $12.70 shipping,free shipping, $9.33 to $14.81 shipping, $4.00 to $11.16 shipping,free to $7.95 shipping,free shipping, $9.00 shipping, $6.95 to $10.51 shipping, $7.99 to $13.87 shipping,free to $14.35 shipping, $9.99 shipping,free to $15.95 shipping,free to $6.95 shipping,free shipping,free to $7.28 shipping, $8.95 shipping,free to $11.00 shipping,free shipping,free shipping,free to $9.99 shipping,free to $20.00 shipping, $7.95 shipping,,,,,,,,,,,,

Which I like to change the data frame to have the different stores for the same product in the rows not columns? the product name would be repeated by the store name would be the different names and then price and shipping.Here is the screenshot and Expected Output:

Title,Price,Store,Shipping,Price_Per_100_g
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$57.75,Woofers World, $7.95 shipping,$5.78 per 100g
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.60,VetShopAustralia, $9.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.60,Vet Products Direct, $7.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.95,Pet Shop Direct, $7.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.95,Petso, $4.95 to $9.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.99,PetPost, $10.00 to $12.70 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$66.29,Pet Chemist,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$67.32,Your PetPA, $9.33 to $14.81 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$69.69,Pet Circle, $4.00 to $11.16 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$69.69,Petbarn,free to $7.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$69.95,World for Pets,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$70.19,Lucky Pet, $9.00 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$71.99,Stefmar, $6.95 to $10.51 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$72.99,Budget Pet Products, $7.99 to $13.87 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$73.39,Best Friends Pets,free to $14.35 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$75.15,Kellyville Pets, $9.99 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$75.95,iPetStore,free to $15.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$76.99,My Pet Warehouse,free to $6.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$77.99,Pet City,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$77.99,Pets Unleashed,free to $7.28 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$79.99,PetO, $8.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$81.54,PETstock,free to $11.00 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$81.99,Pet House,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$83.49,Habitat Pets,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$83.49,Pet Culture,free to $9.99 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$83.50,Peticular,free to $20.00 shipping,

Can anyone help me figure out the best way to reach the expected output? Thanks!

CodePudding user response:

Try:

import requests
import pandas as pd
from bs4 import BeautifulSoup


url = "https://99petshops.com.au/Search?brandName=Ziwi Peak&animalCode=DOG&storeId=89/&page=1"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

all_info = []
for item in soup.select(".pd-info"):
    title = item.h2.get_text(strip=True)
    price = item.select_one('span:-soup-contains("Price")').span.text

    shipping = item.select_one(".shipping")
    shipping = shipping.get_text(strip=True) if shipping else ""

    store_name = item.select_one(".pd-name img")
    store_name = store_name["alt"] if store_name else ""

    price_per_100_g = item.select_one(".unit-price")
    price_per_100_g = price_per_100_g.text if price_per_100_g else ""

    d = {
        "Title": title,
        "Price": price,
        "Store": store_name,
        "Shipping": shipping,
        "Price_Per_100_g": price_per_100_g,
    }

    stores = []
    for p in item.select(".sp-price"):
        price = p.get_text(strip=True)
        name = p.find_next("img")
        name = name["alt"] if name else "N/A"
        shipping = p.find_next(class_="shipping")
        shipping = shipping.get_text(strip=True) if shipping else "N/A"
        stores.append((name, price, shipping))
    d["Stores"] = stores
    all_info.append(d)

df = pd.DataFrame(all_info).fillna("").explode("Stores")

df[["Store Name", "Store Price", "Store Shipping"]] = df.pop("Stores").apply(
    pd.Series
)
print(df.head())

df.to_csv("data.csv", index=False)

Prints:

                                                   Title   Price          Store         Shipping Price_Per_100_g           Store Name Store Price              Store Shipping
0  Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg  $57.75  Woofers World   $9.95 shipping  $5.78 per 100g  Vet Products Direct      $64.60              $7.95 shipping
0  Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg  $57.75  Woofers World   $9.95 shipping  $5.78 per 100g     VetShopAustralia      $64.60              $7.95 shipping
0  Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg  $57.75  Woofers World   $9.95 shipping  $5.78 per 100g                Petso      $64.95   $10.00 to $12.70 shipping
0  Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg  $57.75  Woofers World   $9.95 shipping  $5.78 per 100g      Pet Shop Direct      $64.95     $4.95 to $9.95 shipping
0  Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg  $57.75  Woofers World   $9.95 shipping  $5.78 per 100g              PetPost      $64.99               free shipping

and saves data.csv (screenshot from LibreOffice):

enter image description here

CodePudding user response:

One option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

df = pd.read_excel('data_2.xlsx')
df.columns = df.columns.str.lower()
df.pivot_longer(
    index = 'title', 
    names_to = '.value', 
    names_pattern = '(price|store|shipping).*'
)

                                                 title    price          store                   shipping
0    Ziwi Peak Dog Air-Dried Free Range Chicken Rec...   $57.75  Woofers World             $7.95 shipping
1    Ziwi Peak Dog Air-Dried Free Range Chicken Rec...  $127.80  Woofers World   $7.95 to $15.62 shipping
2            Ziwi Peak Dog Air-Dried Lamb Recipe 2.5Kg  $127.80  Woofers World   $7.95 to $15.62 shipping
3             Ziwi Peak Dog Air-Dried Beef Recipe 454g   $29.25  Woofers World   $7.95 to $11.82 shipping
4    Ziwi Peak Dog Air-Dried Free Range Chicken Rec...   $29.25  Woofers World    $7.95 to $8.68 shipping
..                                                 ...      ...            ...                        ...
925             Ziwi Peak Dog Canned Food Venison 390g      NaN            NaN                        NaN
926   Ziwi Peak Dog Venison Shank Bone Oral Chews Half      NaN            NaN                        NaN
927              Ziwi Peak Good Dog Reward Venison 85g      NaN            NaN                        NaN
928            Ziwi Peak Dog Venison Lung & Kidney 60g      NaN            NaN                        NaN
929         Ziwi Peak Dog Wet Venison Recipe 170g X 12      NaN            NaN                        NaN

Another option would have been pd.wide_to_long, but it raised a ValueError:

In [48]: pd.wide_to_long(df, stubnames = ['price', 'store', 'shipping'], i='title', j='end', suffix='.*')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [48], line 1
----> 1 pd.wide_to_long(df, stubnames = ['price', 'store', 'shipping'], i='title', j='end', suffix='.*')

File ~/miniforge3/envs/playground/lib/python3.10/site-packages/pandas/core/reshape/melt.py:527, in wide_to_long(df, stubnames, i, j, sep, suffix)
    524     stubnames = list(stubnames)
    526 if any(col in stubnames for col in df.columns):
--> 527     raise ValueError("stubname can't be identical to a column name")
    529 if not is_list_like(i):
    530     i = [i]

ValueError: stubname can't be identical to a column name
  • Related