Home > Back-end >  Crawl multiple pages with python but other pages show same results as first page
Crawl multiple pages with python but other pages show same results as first page

Time:11-21

I try to crawl data by bs4. For each page, I want to take all product id's, it's ok when I take data from first page, but starting with page 2 it always show product id's from first page. Here is my code (although I changed page = 5):

from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://tiki.vn/lam-sach-da-mat/c11232?sort=top_seller?page=5&page=5')
bs = BeautifulSoup(html, 'html.parser') 

result =bs.find_all(lambda tag: tag.get('class') == ['product-item'])

Here is the result of 5th page in my code

I want to take product-id of 5th page as this

I want to get product-id of 5th page but don't understand why my code still show result of first page.

CodePudding user response:

It appears that, including advertising, there are 107 products. Here is a way of scraping the API endpoint directly and get all products:

import requests
import pandas as pd

pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)

headers = {'accept': 'application/json, text/plain, */*',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}

url = 'https://tiki.vn/api/personalish/v1/blocks/listings?limit=300&include=advertisement&aggregations=2&trackity_id=527749d7-0a68-f53e-54b5-fe2da48136f2&category=11232&page=1&sort=top_seller?page=5&urlKey=lam-sach-da-mat'
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data'])
print(df)

Result:

    id  sku name    url_key url_path    type    author_name book_cover  brand_name  short_description   price   list_price  badges  badges_new  discount    discount_rate   rating_average  review_count    order_count favourite_count thumbnail_url   thumbnail_width thumbnail_height    freegift_items  has_ebook   inventory_status    is_visible  productset_id   productset_group_name   seller  is_flower   is_gift_card    inventory   url_attendant_input_form    option_color    stock_item  salable_type    seller_product_id   installment_info    url_review  bundle_deal video_url   tiki_live   original_price  shippable   impression_info availability    quantity_sold.text  quantity_sold.value advertisement.ad    advertisement   quantity_sold
0   33606848    9815250596996   Kem tẩy da chết làm trắng sáng và đều màu da Paula’s Choice RESIST Daily Smoothing Treatment With 5% AHA 50 ml - 7660   dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848 dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848.html?spid=33606849          None    Paula's Choice      849000  0   []  [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'official_store', 'icon': 'https://salt.tikicdn.com/ts/upload/5d/4c/f7/0261315e75127c2ff73efd7a1f1ffdf2.png', 'icon_height': 14, 'icon_width': 68, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'freegift_items', 'placement': 'under_rating', 'text': 'Quà tặng', 'type': 'under_rating_text'}, {'code': 'asa_reward_badge', 'icon': 'https://salt.tikicdn.com/ts/upload/d6/51/17/cde193f3d0f6da18147a739247c95c93.png', 'icon_height': 20, 'icon_width': 53, 'placement': 'bottom', 'type': 'asa_reward'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 144 ASA (48k ₫)<br/>≈ 5.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 100000  11  4.8 42  0   0   https://salt.tikicdn.com/cache/280x280/ts/product/2e/43/38/ca1cbb77f9993e07db7ba3e107644d56.jpg 280 280 []  False   available   False   345     None    False   False   None        []  None        33606849    None        False   None    None    949000  True    [{'impression_id': 'thanos-product-VaFjRtzzGwSO09QA', 'metadata': {'price': 849000, 'rating_average': 4.8, 'reviews_count': 42, 'seller_product_id': 33606849}}, {'impression_id': '97c3dfe2-cf95-4161-94a3-529235d45ae1', 'metadata': {'advert_id': 3492748, 'business_id': 4769, 'flags': {'ad-2752': 5, 'p_cate': 8206, 'predictor': 'cb10', 'src': 'cat'}, 'product_id': 33606849, 'service_name': 'makesense', 'user_bucket': 570}}, {'impression_id': '99e7098f-6f87-4ee3-bd7a-7be637d2f402', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1   Đã bán 207  207.0   [{'match_id': 0, 'advert_id': 0, 'business_id': 0, 'seller_id': 3946, 'clickUrl': '//tka.tiki.vn/pixel/pixel?data=djAwMUXcpZFK1GmJWEEH5wJs-tR0dD_QzrnUur3S0rkR7vFGb3Cx5LSWiqPQqZanWSaNuWGOindrBvik_aCq_9cjyQF8D906qQBOt-t08-ZoBweFhLNgyc5q11ZIVWlIHUQ19sfWQ8KQU_v-jzTMDXWv8osQqhXDUvwVkcKHHbvPz_q81AFAXFZp2IFbKnoZAYFibzaoW-UcoQiAnYZVtCWBvbQg2Qzx5TUVh6LJAgL0aMNk5tts6O8clarx2ICB8U95RnWeT6o8QjqNUl3NRakOED4nqSFEgtddT2Rci9Xqr-7vt_JEYULCGuKG2Oj7zqT-sAhWduFt3dkzhmsozBZvSURwk9vgVt1K4wvBf8wMX33iRyMCM1VIjd3PKGEV0QaEkQMGl_ulC_3fST17wZvrfdcFVqSPoGj98O63eir50lnrVNXYbpFlgDmIYMUqMs-rEkx_XvtSo76XIKoDmgn5GvLe2aewoNYvkI27vRCvV8Ufj7qhD9RAUXVFHv_DY5lVJRJ0j1vtnPYbnv8USOGUKu4RPRc93gxXukOuRxHq84a69M8zczLS25KVVfHnMmqbe3TZHvVg3zCtlc6tAiXiyJ0YhkxrhlRvEU32wpdX9Cv4M97rReqQJa7mZfFGxZ0rnWO58FSRO3Yt3xs_iAsMPHQ-0i8XTgFrh6BPaxS0xvM3EtgIjcjF63byLGz0NXcVj77whvoi2f9TFZMxy1O_Tte_Htf7TnrFNUCdo7xYVlIdymv3Jsfcy-YwW38uQ7Q9_4f9tcZGKF8BDuxaCPrKBZTp32HXOSd7zWsMX2wv3t0l4r4VjBaC2CZSSfvZNRlME4o0m-Q6YVNRb4Wk33DocSnphdBXztLhwpMaWSAFoErDNsZL9Qgqk4y-U8wb-UAV8BppXUKMpJkDwG8GxtGNUZ_PhZN4G1Jb4C-IdwLyeZxfwgcUV2LZ5k1D4WJ8lv707sAIHBCADaCxdRDJmjcX6-A7kDfpfT05W6tQzak5ElGkYC4YZxcr7TRW8EoJaV72glEkSBFdj1J5GNQqiajyh3XC88zQNSce-hI1keyZe-0qUJmZxcNEOUVoYZ7ifX-lz5jtWx_kbIYnZ_R3bvxf-FWyaNEOpXnh3s3iCluPd74Lmpea8AeLY2jGvqD77_cuvelXgbae4mP44E6uDCe_YOPF2ud3XgcssU_LoqxQrjj2x-nAYYy9tTIYPDsweEC89EKD0KciSFSB4UH5AQLzpTluPSFBGR9ki46I49xvOM3SS7fLaqcwyqWnnqMIDQ&CLICK&reqid=vFRZNdJMRr&pos=1&redirect=https://tiki.vn/dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848.html?spid=33606849', 'impUrl': '//tka.tiki.vn/pixel/pixel?data=djAwMR4kJOdkR6ThYEk7cciKp_1p0OpnJbH7SMwoWHj-Kjq2Cfp3Kr1qYggzrNrQE94SkWBR63dQO0Rj9g43Lhxrv8ggtrbKXBXQrAcmo32rEnvX_c4JiVgX_dPpAdubrRE_zagc7UdYpVqjJEiBQxE1_ioxavawDQ2SN12opCjx-yV3SuJOQyd9daAyxHl76CHW8acVYXE4wCHCeuLpW4YdhZwd4gXzMxPiyQyxGUVYLOJZBObM9md59_Ow96AddUaasyn0Yry3RUv5GZ_46O7u0eFENwZDlwEE2jrz6IYGHPOre4hWQmTtK1HSZWi6UmPDyy4qDjcw45nqIbs8hFmXBoCwCEa3oRLQ1sP8iVHJvYTN1eNQblQmPqWgVECEm3bd2kEMOd2H3qoVw__KqnBfD4G4avOZ5CnN-DFQTURnjUvqeyKJNuIAsvx8CLyZx25T6Ni6S1gL16_v4X3w3-0NbRHpqZrbJ1vFYZqdXba65MtDtsLz35yyWc3fo2iNTgvXMf5qkYGiAnMNoYaP6yv0YuvAozh0ekqDyS9qbEqDIwa7R87K4f_IDuKWwrhqfvC6gLAlPZk8M3vTOi1lxV15y5jI3WLsW5-sv0T7ypCmNnv56QMxfPMDJipD5ae9XFZWpBuQRoIHYOgASgWFTKrs85Escv0JjXkcLKkNoJM5KrPzNVxs5JeQv1qLvUXzyR0UYwnc3qVBlxIpbSoZOoK2bien691jxWED6osm94CVSVv_fw7yHW12fP9sh-Req2vGQvGq3D40ndF2ag1xpAdytgsJoHVxgoAPQS58TneDHE-GBdigYJSIjIKJ5c0Hxh7pvkajdnMHqvhxR44_Zo6LRgXvE_ZL048BwQOlz0gXJrzj8rVmwALMWC6N5R4_0bLjV4kUFEdg6LhdWC3Uu6CYsW5qyXcpfB2ampI2Q9Y3vOPgWKSa59e68kaUvZU_XHqFDotp2Kxc6fsQWH3f_e9z2NF6_jlLjN1N_xO-EODoEFEYGC2lxuCHo7h0qZLQO_TsIHo2eEfTqtEfHj7uWM1gHnlyf7mi2esr2Z2ow2Ul8NYYlSoMi_HHdynp-ogLWbva8_Zg-2h27X8&SHOW&reqid=vFRZNdJMRr&pos=1', 'trueImpUrl': '//tka.tiki.vn/pixel/pixel?data=djAwMcxZvpGC5q0kmKZdC0f7E7NTPgcQz3QIsrPU8HOdN3XcwRPLThqpL4IOS8wEXwiUOs06YO5ZboJEkr9lfAKAxN3tx0uDN278ihaJR75TE4VxdG2GeapxtNnUPEAY_92eVJfWBCwT2w4bGphxkBPuXbfRGo2viNSDHGPBz0EQbY94JAW8aYFV-O0_zl71Umd-6E5gd1-MlcnwMuV-VFQIKrjyTS1Udes_05EiM0KNl6glpvJPbN0dnL1SwCHHIWYZoxBNUfnfnS6aFqieWT-Zy_JirTZ3zWycjMX7gGaNkel3nUf4lQz4v7y6VHDkbfNtuKMJ5-OpzOF4A39-_ehIGtMohJjhx853kTRf3M5CO6_Fsw7LOKlKBPt6vioGblWSYiMF7fgO7qmrpv6Pv5DIaMEjJVE1qaGiKZmClN0xRLQS47bUFfrv7MTpAgCpj8YneFs0q7to23HiInMZpMSYBhFIXFIMk_RtA32f2pTeO399OG1_fV6J05VxACqfzfm_1zxWt1LSlUGME_Sb1F25uJP542ZPru8sgo6Q5Vy46A-zpRfk02MqXefuHEtw2cSRKxAzaK4yK7xyZiPK8cBarDgbPv8TuEpsT0bhu6x9lOB-fyvTZFAN8LS9uQ1nfeFn2icW0d472Q6w1TWtO5IC_fst1or5KHW9qKC_P4EDERumpobf7GAb-34ojWcAQtOWPHJpmqRGvLV0MR_ISNFwIkXmOW8Bgytfkuenox9_7Niqg9x3qEEdmA1Rr0R0KbEkiIkZRDS7ZsGeVIVjn1cTSaLbGwLNIVAIytKyNa6_BhYNvnjihOFjVweuIaNhvjmyWAMht-X9NkzNnWsQPNZy2ha3JABfCw4fkD4cOk8m16NH6avet0fG1cLGOYmKOneUWk7ihIWWRNFB0OoN2HQDsREfhQ2qEUWBrmDPHI-1Slt5MbcnS8cm0sF8mBu5LO4-RV-9i3e7VI0Ti7Cbcj5f6kD8QVkQSAijsSCDAwKA3oRPpjl7eeCu1qrmXCMW2VEBe0ftOFZSNg6PKQ-Pyvrxwkdpjf_yICGdOfGH_t0KNPi3DnKOlVrTKjg_osk&VIEW&reqid=vFRZNdJMRr&pos=1', 'properties': {'product_id': '33606849', 'matched_query': 0, 'image': '', 'url': ''}, 'type': 'ProductAdvertType', 'impression_info': [], 'image_ratio': 0}]    NaN NaN
1   33606786    5722576750824   Lotion tẩy tế bào chết làm sáng da Paula’s Choice Skin Perfecting 8% AHA Lotion 100ml 2060  lotion-tay-te-bao-chet-lam-sang-da-paula-s-choice-skin-perfecting-8-aha-lotion-100ml-2060-p33606786 lotion-tay-te-bao-chet-lam-sang-da-paula-s-choice-skin-perfecting-8-aha-lotion-100ml-2060-p33606786.html?spid=66640131          None    Paula's Choice      663000  0   []  [{'code': 'tikinow', 'icon': 'https://salt.tikicdn.com/ts/upload/3f/76/87/4c636b7bea11521f46f733b7839df4de.png', 'icon_height': 16, 'icon_width': 32, 'placement': 'delivery_info', 'text': 'Giao siêu tốc 2H', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 73 ASA (24k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}]  0   0   4.7 20  0   0   https://salt.tikicdn.com/cache/280x280/ts/product/2f/6d/8d/081edbe77b16439c4fa0b18263cbede7.jpg 280 280 []  False   available   False   345     None    False   False   None        []  None        66640131    None        False   None    None    663000  True    [{'impression_id': 'thanos-product-QvglqBASVoDvOhFS', 'metadata': {'price': 663000, 'rating_average': 4.7, 'reviews_count': 20, 'seller_product_id': 66640131}}, {'impression_id': '75f5fa8b-3b32-4280-86a4-141778a1cb1f', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}]  1   Đã bán 36   36.0    NaN NaN NaN
2   11239286    9792297299199   Gel tẩy da chết Arrahan Lemon White Peeling Gel (180ml) gel-tay-da-chet-arrahan-lemon-white-peeling-gel-180ml-p11239286 gel-tay-da-chet-arrahan-lemon-white-peeling-gel-180ml-p11239286.html?spid=20116852          None    Arrahan     61900   0   []  [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 7 ASA (2k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}]   0   0   4.7 77  0   0   https://salt.tikicdn.com/cache/280x280/ts/product/93/cb/da/afd6b13fe3654bf4351b260b801c41e3.jpg 280 280 []  False   available   False   345     None    False   False   None        []  None        20116852    None        False   None    None    61900   True    [{'impression_id': 'thanos-product-UDr0lE1YpujdRftZ', 'metadata': {'price': 61900, 'rating_average': 4.7, 'reviews_count': 77, 'seller_product_id': 20116852}}, {'impression_id': 'e81a5e80-1c98-4d19-abdc-a60250309814', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}]   1   Đã bán 539  539.0   NaN NaN NaN
3   33606848    8573828662870   Kem tẩy da chết làm trắng sáng và đều màu da Paula’s Choice RESIST Daily Smoothing Treatment With 5% AHA 50 ml - 7660   dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848 dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848.html?spid=66638723          None    Paula's Choice      529000  0   []  [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 58 ASA (19k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0   0   4.7 7   0   0   https://salt.tikicdn.com/cache/280x280/ts/product/f8/10/ef/714f6b435ade504ce920caeff4ace16f.jpg 280 280 []  False   available   False   345     None    False   False   None        []  None        66638723    None        False   None    None    529000  True    [{'impression_id': 'thanos-product-zTwvu1Q7UONamIJN', 'metadata': {'price': 529000, 'rating_average': 4.7, 'reviews_count': 7, 'seller_product_id': 66638723}}, {'impression_id': '95894dfd-049c-41e4-848a-e179e8a5c03b', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}]   1   Đã bán 27   27.0    NaN NaN NaN
4   20525156    3751926198377   Tẩy Tế Bào Chết 3W Clinic Collagen Crystal Peeling Gel 180ml    tay-te-bao-chet-3w-clinic-collagen-crystal-peeling-gel-180ml-p20525156  tay-te-bao-chet-3w-clinic-collagen-crystal-peeling-gel-180ml-p20525156.html?spid=20525157           None    3W Clinic       119000  0   []  [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 3 ASA (981 ₫)<br/>≈ 0.8% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}]    31000   21  4.7 12  0   0   https://salt.tikicdn.com/cache/280x280/ts/product/9a/01/46/71fa72df01b8addc69770f67b3bcedab.jpg 280 280 []  False   available   False   345     None    False   False   None        []  None        20525157    None        False   None    None    150000  True    [{'impression_id': 'thanos-product-avJJKrffpf79iq8i', 'metadata': {'price': 119000, 'rating_average': 4.7, 'reviews_count': 12, 'seller_product_id': 20525157}}, {'impression_id': 'c9e9e842-8065-4d9e-8269-0d3b466e311b', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}]  1   Đã bán 51   51.0    NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
102 4701145 6917512701766   Kem Tẩy tế bào chết cho mặt Byphasse Exfoliant Face Scrub Dành cho mọi loại da  kem-tay-te-bao-chet-cho-mat-byphasse-exfoliant-face-scrub-danh-cho-moi-loai-da-p4701145 kem-tay-te-bao-chet-cho-mat-byphasse-exfoliant-face-scrub-danh-cho-moi-loai-da-p4701145.html?spid=27924960          None    Byphasse        119000  0   []  [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_badge', 'icon': 'https://salt.tikicdn.com/ts/upload/d6/51/17/cde193f3d0f6da18147a739247c95c93.png', 'icon_height': 20, 'icon_width': 53, 'placement': 'bottom', 'type': 'asa_reward'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 20 ASA (7k ₫)<br/>≈ 5.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}]    0   0   4.0 8   0   0   https://salt.tikicdn.com/cache/280x280/ts/product/a1/7c/77/7acfba66ad481b870be5fdb1d10a4662.jpg 280 280 []  False   available   False   345     None    False   False   None        []  None        27924960    None        False   None    None    119000  True    [{'impression_id': 'thanos-product-Khf0Kz3w7kEtxJ1U', 'metadata': {'price': 119000, 'rating_average': 4, 'reviews_count': 8, 'seller_product_id': 27924960}}, {'impression_id': 'cabf20ec-d814-48ba-b000-7126ed1a22d5', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1   Đã bán 49   49.0    NaN NaN NaN
103 38465349    8446201287222   Gel Tẩy Tế Bào Chết Keana Baking Soda Moist Peeling (120G) - HÀNG CHÍNH HÃNG    gel-tay-te-bao-chet-keana-baking-soda-moist-peeling-120g-hang-chinh-hang-p38465349  gel-tay-te-bao-chet-keana-baking-soda-moist-peeling-120g-hang-chinh-hang-p38465349.html?spid=38465350           None    Keana       421200  0   []  [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_badge', 'icon': 'https://salt.tikicdn.com/ts/upload/d6/51/17/cde193f3d0f6da18147a739247c95c93.png', 'icon_height': 20, 'icon_width': 53, 'placement': 'bottom', 'type': 'asa_reward'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 72 ASA (24k ₫)<br/>≈ 5.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}]   118800  22  4.5 2   0   0   https://salt.tikicdn.com/cache/280x280/ts/product/7e/a0/59/8dd959d52b59306d83523204062ad713.jpg 280 280 []  False   available   False   345     None    False   False   None        []  None        38465350    None        False   None    None    540000  True    [{'impression_id': 'thanos-product-Rcuvm8ucH1kfYALI', 'metadata': {'price': 421200, 'rating_average': 4.5, 'reviews_count': 2, 'seller_product_id': 38465350}}, {'impression_id': '3281c96d-3a6c-4b26-838b-4b50e9f9a618', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}]   1   Đã bán 3    3.0 NaN NaN NaN
104 15213464    4407455438680   Tẩy bào chết Belif Mild And Effective Facial Scrub 100ml    tay-bao-chet-belif-mild-and-effective-facial-scrub-100ml-p15213464  tay-bao-chet-belif-mild-and-effective-facial-scrub-100ml-p15213464.html?spid=76083479           None    Belif       630000  0   []  [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 16 ASA (5k ₫)<br/>≈ 0.8% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}]  0   0   0.0 0   0   0   https://salt.tikicdn.com/cache/280x280/ts/product/2a/cf/b8/fb265c0ce6944bbb6aa822eca1642be3.png 280 280 []  False   available   False   345     None    False   False   None        []  None        76083479    None        False   None    None    630000  True    [{'impression_id': 'thanos-product-v8EPXz3gtAHsbWNz', 'metadata': {'price': 630000, 'rating_average': 0, 'reviews_count': 0, 'seller_product_id': 76083479}}, {'impression_id': 'f801f336-5090-46f9-ba75-41a7e22a0dc2', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1   Đã bán 1    1.0 NaN NaN NaN
105 51088975    9244203860400   Dấm táo The Inkey List Apple Cider Vinegar Acid Peel 30ml   dam-tao-the-inkey-list-apper-cider-vinegar-acid-peel-30ml-p51088975 dam-tao-the-inkey-list-apper-cider-vinegar-acid-peel-30ml-p51088975.html?spid=51088976          None    The Inkey List      589000  0   []  [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 65 ASA (21k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0   0   5.0 1   0   0   https://salt.tikicdn.com/cache/280x280/ts/product/87/6f/86/0bae14bd8ebd26ae57a95f8bb47de9da.png 280 280 []  False   available   False   345     None    False   False   None        []  None        51088976    None        False   None    None    589000  True    [{'impression_id': 'thanos-product-RKuBQgdldtq0QZut', 'metadata': {'price': 589000, 'rating_average': 5, 'reviews_count': 1, 'seller_product_id': 51088976}}, {'impression_id': '4a762b28-db43-4da1-a039-6bf01d078413', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1   Đã bán 3    3.0 NaN NaN NaN
106 24408456    5696255831404   Gel Giúp Loại Bỏ Tế Bào Chết IASO   gel-giup-loai-bo-te-bao-chet-p24408456  gel-giup-loai-bo-te-bao-chet-p24408456.html?spid=24408458           None    IASO        441000  0   []  [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 49 ASA (16k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}]   49000   10  0.0 0   0   0   https://salt.tikicdn.com/cache/280x280/ts/product/78/29/34/b52258f69bfe3349bfe9c55a7dd9095c.jpg 280 280 []  False   available   False   345     None    False   False   None        []  None        24408458    None        False   None    None    490000  True    [{'impression_id': 'thanos-product-mSLaaOhB4aLTtZa7', 'metadata': {'price': 441000, 'rating_average': 0, 'reviews_count': 0, 'seller_product_id': 24408458}}, {'impression_id': '4fd674dd-ee1d-459a-9c0d-1cf00d99662b', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1   NaN NaN NaN NaN NaN

CodePudding user response:

Btw, you can specifically check the actual page number of the html with something like soup.select_one('li > a.current[data-view-id="product_list_pagination_item"][data-view-label]').get('data-view-label').

Explanation: No matter which page you use the link for, it's always the first page that is loaded first, and then the page is updated dynamically (with JavaScript and APIs). You can see this by going to the network tab on devtools [ you might have to refresh page after opening, and make sure that the "preserve log" option is not checked ] and clicking on [the name of] the first request in the log [it should end the same as the link in the address bar] ; the html in the "Response" is what is fetched by requests.get - you might notice that this html is of the first page.

If you scroll through the other requests in the log, you should find one to https://tiki.vn/api/personalish/v1/blocks/listings?limit=40&include=advertisement&aggregations=2&trackity_id=3dddf2b8-1eb2-e891-0cdf-c23b37663c28&category=11232&page=5&sort=top_seller?page=5&urlKey=lam-sach-da-mat

and the products are probably loaded from this. All of the params PPEr to be fixed, or can be found in the page url, except for trackity_id; if you look at the request initiator chain, you can see which JavaScript file made the request and you could try to figure out how trackity_id is generated; but personally, I'd find it easier to just use selenium.


Suggested Solution 1: It appears that you can actually use the API with only the params we already know (category, urlKey, sort):

# import cloudscraper
r = cloudscraper.create_scraper().get('https://tiki.vn/api/personalish/v1/blocks/listings?limit=300&category=11232&sort=top_seller?page=5&urlKey=lam-sach-da-mat')
productList = r.json()['data']
print('### [{id}_{sku}: {name}] for first 10 products of', f'{len(productList)} ###\n')
for p in productList[:10]: print(f"{p['id']}_{p['sku']}: {p['name']}")

(I used cloudscraper because I'm not very familiar with urlopen, and also I'm not good at setting the right headers with requests to avoid 403 errors....) This prints

### [{id}_{sku}: {name}] for first 10 products of 100 ###

33606786_5722576750824: Lotion tẩy tế bào chết làm sáng da Paula’s Choice Skin Perfecting 8% AHA Lotion 100ml 2060
11239286_9792297299199: Gel tẩy da chết Arrahan Lemon White Peeling Gel (180ml)
33606848_8573828662870: Kem tẩy da chết làm trắng sáng và đều màu da Paula’s Choice RESIST Daily Smoothing Treatment With 5% AHA 50 ml - 7660
20525156_3751926198377: Tẩy Tế Bào Chết 3W Clinic Collagen Crystal Peeling Gel 180ml
67089667_9204550497315: Combo 2 chai tiện lợi - Natureine AQUA PEEL Moisture Peeling Gel - Gel tẩy tế bào da chết, cấp ẩm Nhật Bản - Chính Hãng
21481823_9335684703529: Gel tẩy tế bào chết sáng da hồng sâm Hàn Quốc My Gold Korea Red Ginseng Peeling Gel (130ml) – Hàng Chính Hãng
46203526_8584500833846: Bông Tẩy Da Chết Cosrx One-Step Original Clear Pad 70 Sheets (New 2019)
1941543_2999847759227: Kem tẩy tế bào chết mặt Organic Shop Organic Coffee & Powder 75ml
57783000_9733773668061: Natureine AQUA PEEL Moisture Peeling Gel - Gel tẩy tế bào da chết, cấp ẩm Nhật Bản - Chính Hãng
7319657_7325473003642: Trial Tinh chất dành cho da mụn cao cấp Resist BHA 9 0.83 ml

However, I feel like there should have been more than just 100 products - paginating with selenium (below) indicated that there should be 177 products.


Suggested Solution 2: You can loop through the pages using this function that I wrote to get and parse html (with selenium bs4)

maxPages = 10  # or as you prefer
nextUrl = 'https://tiki.vn/lam-sach-da-mat/c11232?sort=top_seller'
pgi_sel = 'data-view-id="product_list_pagination_item"'
for pn in range(1, maxPages 1):
    curPage_xpath = f'//li/a[@][@{pgi_sel}][@data-view-label="{pn}"]'
    soup = linkToSoup_selenium(nextUrl, ecx=curPage_xpath)
    if soup is None or type(soup) == str: break

    ###################### EXTRACT DATA ######################
    # this is just printing the page# and 1st five IDs, but you can extract whatever you need from soup at this point
    curPg = soup.select_one(f'li > a.current[{pgi_sel}][data-view-label]')
    curPg = f'page {curPg.get("data-view-label")}' if curPg else '!! page ERROR !!'

    pageProds = soup.select('a.product-item[href*=".html?spid="]')
    curPg  = f" [{len(pageProds)} products]:"
    first5ids = [a.get('href').split('.html?spid=')[-1] for a in pageProds][:5]
    print(f'{curPg:>22} ', " ".join([f'{i:>10}' for i in first5ids]), '...')
    ##########################################################

    nxtPg = soup.select_one(f'li > a[{pgi_sel}][href]:has(img[alt="arrow-right"])')
    if nxtPg is None or 'disabled' in nxtPg.get('class', ''): break
    nextUrl = nxtPg.get('href')

and that printed

 page 1 [40 products]:    66640131   20116852   66638723   20525157   67089668 ...
 page 2 [40 products]:    63465592   20911921   54388844   58555745   13385021 ...
 page 3 [40 products]:     1515345   57703788    1060978   54929902    2076819 ...
 page 4 [40 products]:    35737314   26299382    7029351   14970693   32139853 ...
 page 5 [11 products]:    52274203   51988147   50422842   36828505   45439018 ...

(If you don't want to limit to maxPages, you can just use something like while True instead of for pn in range(maxPages), but then you'll also need to use a counter or something to get pn for ecx, since that's what tells the function to wait until that part of the html is loaded.)

  • Related