I am trying to scrape the images
they will give me 23 images
but I want t apply limit
that they will give me only 10 images Can you help me in these matter
import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl='https://twillmkt.com'
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get('https://twillmkt.com/collections/denim')
soup=BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('div',class_='ProductItem__Wrapper')
productlinks=[]
for links in tra:
for link in links.find_all('a',href=True):
comp=baseurl link['href']
productlinks.append(comp)
data = []
for link in set(productlinks):
r =requests.get(link,headers=headers)
soup=BeautifulSoup(r.content, 'html.parser')
up = soup.find('div',class_='Product__SlideshowNavScroller')
for e,pro in enumerate(up):
t=pro.find('img').get('src')
data.append({'id':t.split('=')[-1], 'image':'Image ' str(e) ' UI','link':t})
df = pd.DataFrame(data)
df.image=pd.Categorical(df.image,categories=df.image.unique(),ordered=True)
df = df.pivot(index='id', columns='image', values='link').reset_index().fillna('')
df.to_csv('kj.csv')
CodePudding user response:
Slice the resultset of images by [:10]
...
up = soup.select('div.Product__SlideshowNavScroller img')[:10]
for e,pro in enumerate(up):
t=pro.get('src')
data.append({'id':t.split('=')[-1], 'image':'Image ' str(e) ' UI','link':t})
...
And if you like to start the images named from 1 instead of 0:
...
up = soup.select('div.Product__SlideshowNavScroller img')[:10]
for e,pro in enumerate(up, start=1):
t=pro.get('src')
data.append({'id':t.split('=')[-1], 'image':'Image ' str(e) ' UI','link':t})
...
EDIT
basically in excel file after 9 entries they will store 5 images in one rows and next 5 image in another row the problem is they cannot store 10 images in one row
Okay got the point - Behavior is not based on number of images, issue here is that the id is not unique, it is not the id / sku of the product.
How to fix?
Lets pick the sku from product and use it as id in your dataframe:
sku = soup.select_one('.oos_sku').text.strip().split(' ')[-1]
for e,pro in enumerate(up, start=1):
t=pro.get('src')
data.append({'id':sku, 'image':'Image ' str(e) ' UI','link':t})
Example
import requests
from bs4 import BeautifulSoup
import pandas as pd
baseurl='https://twillmkt.com'
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get('https://twillmkt.com/collections/denim')
soup=BeautifulSoup(r.content, 'html.parser')
tra = soup.find_all('div',class_='ProductItem__Wrapper')
productlinks=[]
for links in tra:
for link in links.find_all('a',href=True):
comp=baseurl link['href']
productlinks.append(comp)
data = []
for link in set(productlinks):
r =requests.get(link,headers=headers)
soup=BeautifulSoup(r.content, 'html.parser')
up = soup.select('div.Product__SlideshowNavScroller img')
sku = soup.select_one('.oos_sku').text.strip().split(' ')[-1]
for e,pro in enumerate(up, start=1):
t=pro.get('src')
data.append({'id':sku, 'image':'Image ' str(e) ' UI','link':t})
df = pd.DataFrame(data)
df.image=pd.Categorical(df.image,categories=df.image.unique(),ordered=True)
df = df.pivot(index='id', columns='image', values='link').reset_index().fillna('')
df#.to_excel('test.xlsx')
Output
id | Image 1 UI | Image 2 UI | Image 3 UI | Image 4 UI | Image 5 UI | Image 6 UI | Image 7 UI | Image 8 UI | Image 9 UI | Image 10 UI | Image 11 UI | Image 12 UI | Image 13 UI | Image 14 UI | Image 15 UI | Image 16 UI | Image 17 UI | Image 18 UI | Image 19 UI | Image 20 UI | Image 21 UI | Image 22 UI | Image 23 UI | Image 24 UI | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | LOTFEELPJ023-30 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-2_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-3_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-4_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-5_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-6_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-7_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-8_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-9_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-10_160x.jpg?v=1631812617 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/Blue-Ripped-Knee-Distressed-Skinny-Denim-11_160x.jpg?v=1631812617 | |||||||||||||
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
47 | LOTFEELPJ564-S-BRN | //cdn.shopify.com/s/files/1/0089/7912/0206/products/LOTFEELPJ564_16_160x.jpg?v=1639467815 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/LOTFEELPJ564_17_160x.jpg?v=1639467815 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/LOTFEELPJ564_22_160x.jpg?v=1639467815 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/LOTFEELPJ564_15_160x.jpg?v=1639467815 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/LOTFEELPJ564_6_160x.jpg?v=1639467815 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/LOTFEELPJ564_9_160x.jpg?v=1639467815 | //cdn.shopify.com/s/files/1/0089/7912/0206/products/sizechart-stretch-pants_3_ec7e0b0c-1043-4306-a766-33f7e0b3edc8_160x.png?v=1639467869 |