I am trying to scraping the web from there I am getting all the data and I am trying out different formats to represent data. Here is the code:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://99petshops.com.au/Search?brandName=Ziwi Peak&animalCode=DOG&storeId=89/&page=1"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
all_info = []
for item in soup.select(".pd-info"):
title = item.h2.get_text(strip=True)
price = item.select_one('span:-soup-contains("Price")').span.text
try:
store_name = item.select_one('span.hilighted').find_next('img')['alt']
shipping = item.select_one('span.shipping').text.strip()
price_per_100_g = item.select_one('p.unit-price').text.strip()
except:
store_name = ''
shipping = ''
price_per_100_g = ''
d = {"Title": title, "Price": price, "Store": store_name, "Shipping": shipping, "Price_Per_100_g": price_per_100_g}
for i, p in enumerate(item.select(".sp-price"), 1):
try:
store_name = p.find_next("img")["alt"]
except:
store_name = ''
d[f"store_{i:>02}"] = store_name
for i, p in enumerate(item.select(".sp-price"), 1):
d[f"price_{i:>02}"] = p.get_text(strip=True)
for i, p in enumerate(item.select(".shipping"), 1):
d[f"shipping_{i:>02}"] = p.get_text(strip=True)
all_info.append(d)
df = pd.DataFrame(all_info).fillna("")
print(df.head())
df.to_csv("data_2.csv", index=False)
and I have a csv data set from the above scraper consisting of 30 columns and 31 rows Here is a small example:
Title,Price,Store,Shipping,Price_Per_100_g,store_01,store_02,store_03,store_04,store_05,store_06,store_07,store_08,store_09,store_10,store_11,store_12,store_13,store_14,store_15,store_16,store_17,store_18,store_19,store_20,store_21,store_22,store_23,store_24,store_25,price_01,price_02,price_03,price_04,price_05,price_06,price_07,price_08,price_09,price_10,price_11,price_12,price_13,price_14,price_15,price_16,price_17,price_18,price_19,price_20,price_21,price_22,price_23,price_24,price_25,shipping_01,shipping_02,shipping_03,shipping_04,shipping_05,shipping_06,shipping_07,shipping_08,shipping_09,shipping_10,shipping_11,shipping_12,shipping_13,shipping_14,shipping_15,shipping_16,shipping_17,shipping_18,shipping_19,shipping_20,shipping_21,shipping_22,shipping_23,shipping_24,shipping_25,shipping_26,store_26,price_26,shipping_27,store_27,store_28,price_27,price_28,shipping_28,shipping_29,store_29,price_29,shipping_30
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$57.75,Woofers World, $7.95 shipping,$5.78 per 100g,VetShopAustralia,Vet Products Direct,Pet Shop Direct,Petso,PetPost,Pet Chemist,Your PetPA,Pet Circle,Petbarn,World for Pets,Lucky Pet,Stefmar,Budget Pet Products,Best Friends Pets,Kellyville Pets,iPetStore,My Pet Warehouse,Pet City,Pets Unleashed,PetO,PETstock,Pet House,Habitat Pets,Pet Culture,Peticular,$64.60,$64.60,$64.95,$64.95,$64.99,$66.29,$67.32,$69.69,$69.69,$69.95,$70.19,$71.99,$72.99,$73.39,$75.15,$75.95,$76.99,$77.99,$77.99,$79.99,$81.54,$81.99,$83.49,$83.49,$83.50, $9.95 shipping, $7.95 shipping, $7.95 shipping, $4.95 to $9.95 shipping, $10.00 to $12.70 shipping,free shipping, $9.33 to $14.81 shipping, $4.00 to $11.16 shipping,free to $7.95 shipping,free shipping, $9.00 shipping, $6.95 to $10.51 shipping, $7.99 to $13.87 shipping,free to $14.35 shipping, $9.99 shipping,free to $15.95 shipping,free to $6.95 shipping,free shipping,free to $7.28 shipping, $8.95 shipping,free to $11.00 shipping,free shipping,free shipping,free to $9.99 shipping,free to $20.00 shipping, $7.95 shipping,,,,,,,,,,,,
Which I like to change the data frame to have the different stores for the same product in the rows not columns? the product name would be repeated by the store name would be the different names and then price and shipping.Here is the screenshot and Expected Output:
Title,Price,Store,Shipping,Price_Per_100_g
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$57.75,Woofers World, $7.95 shipping,$5.78 per 100g
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.60,VetShopAustralia, $9.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.60,Vet Products Direct, $7.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.95,Pet Shop Direct, $7.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.95,Petso, $4.95 to $9.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$64.99,PetPost, $10.00 to $12.70 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$66.29,Pet Chemist,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$67.32,Your PetPA, $9.33 to $14.81 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$69.69,Pet Circle, $4.00 to $11.16 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$69.69,Petbarn,free to $7.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$69.95,World for Pets,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$70.19,Lucky Pet, $9.00 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$71.99,Stefmar, $6.95 to $10.51 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$72.99,Budget Pet Products, $7.99 to $13.87 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$73.39,Best Friends Pets,free to $14.35 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$75.15,Kellyville Pets, $9.99 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$75.95,iPetStore,free to $15.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$76.99,My Pet Warehouse,free to $6.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$77.99,Pet City,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$77.99,Pets Unleashed,free to $7.28 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$79.99,PetO, $8.95 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$81.54,PETstock,free to $11.00 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$81.99,Pet House,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$83.49,Habitat Pets,free shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$83.49,Pet Culture,free to $9.99 shipping,
Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg,$83.50,Peticular,free to $20.00 shipping,
Can anyone help me figure out the best way to reach the expected output? Thanks!
CodePudding user response:
Try:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "https://99petshops.com.au/Search?brandName=Ziwi Peak&animalCode=DOG&storeId=89/&page=1"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
all_info = []
for item in soup.select(".pd-info"):
title = item.h2.get_text(strip=True)
price = item.select_one('span:-soup-contains("Price")').span.text
shipping = item.select_one(".shipping")
shipping = shipping.get_text(strip=True) if shipping else ""
store_name = item.select_one(".pd-name img")
store_name = store_name["alt"] if store_name else ""
price_per_100_g = item.select_one(".unit-price")
price_per_100_g = price_per_100_g.text if price_per_100_g else ""
d = {
"Title": title,
"Price": price,
"Store": store_name,
"Shipping": shipping,
"Price_Per_100_g": price_per_100_g,
}
stores = []
for p in item.select(".sp-price"):
price = p.get_text(strip=True)
name = p.find_next("img")
name = name["alt"] if name else "N/A"
shipping = p.find_next(class_="shipping")
shipping = shipping.get_text(strip=True) if shipping else "N/A"
stores.append((name, price, shipping))
d["Stores"] = stores
all_info.append(d)
df = pd.DataFrame(all_info).fillna("").explode("Stores")
df[["Store Name", "Store Price", "Store Shipping"]] = df.pop("Stores").apply(
pd.Series
)
print(df.head())
df.to_csv("data.csv", index=False)
Prints:
Title Price Store Shipping Price_Per_100_g Store Name Store Price Store Shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg $57.75 Woofers World $9.95 shipping $5.78 per 100g Vet Products Direct $64.60 $7.95 shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg $57.75 Woofers World $9.95 shipping $5.78 per 100g VetShopAustralia $64.60 $7.95 shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg $57.75 Woofers World $9.95 shipping $5.78 per 100g Petso $64.95 $10.00 to $12.70 shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg $57.75 Woofers World $9.95 shipping $5.78 per 100g Pet Shop Direct $64.95 $4.95 to $9.95 shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Recipe 1Kg $57.75 Woofers World $9.95 shipping $5.78 per 100g PetPost $64.99 free shipping
and saves data.csv
(screenshot from LibreOffice):
CodePudding user response:
One option is with pivot_longer from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df = pd.read_excel('data_2.xlsx')
df.columns = df.columns.str.lower()
df.pivot_longer(
index = 'title',
names_to = '.value',
names_pattern = '(price|store|shipping).*'
)
title price store shipping
0 Ziwi Peak Dog Air-Dried Free Range Chicken Rec... $57.75 Woofers World $7.95 shipping
1 Ziwi Peak Dog Air-Dried Free Range Chicken Rec... $127.80 Woofers World $7.95 to $15.62 shipping
2 Ziwi Peak Dog Air-Dried Lamb Recipe 2.5Kg $127.80 Woofers World $7.95 to $15.62 shipping
3 Ziwi Peak Dog Air-Dried Beef Recipe 454g $29.25 Woofers World $7.95 to $11.82 shipping
4 Ziwi Peak Dog Air-Dried Free Range Chicken Rec... $29.25 Woofers World $7.95 to $8.68 shipping
.. ... ... ... ...
925 Ziwi Peak Dog Canned Food Venison 390g NaN NaN NaN
926 Ziwi Peak Dog Venison Shank Bone Oral Chews Half NaN NaN NaN
927 Ziwi Peak Good Dog Reward Venison 85g NaN NaN NaN
928 Ziwi Peak Dog Venison Lung & Kidney 60g NaN NaN NaN
929 Ziwi Peak Dog Wet Venison Recipe 170g X 12 NaN NaN NaN
Another option would have been pd.wide_to_long
, but it raised a ValueError
:
In [48]: pd.wide_to_long(df, stubnames = ['price', 'store', 'shipping'], i='title', j='end', suffix='.*')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In [48], line 1
----> 1 pd.wide_to_long(df, stubnames = ['price', 'store', 'shipping'], i='title', j='end', suffix='.*')
File ~/miniforge3/envs/playground/lib/python3.10/site-packages/pandas/core/reshape/melt.py:527, in wide_to_long(df, stubnames, i, j, sep, suffix)
524 stubnames = list(stubnames)
526 if any(col in stubnames for col in df.columns):
--> 527 raise ValueError("stubname can't be identical to a column name")
529 if not is_list_like(i):
530 i = [i]
ValueError: stubname can't be identical to a column name