I try to crawl data by bs4. For each page, I want to take all product id's, it's ok when I take data from first page, but starting with page 2 it always show product id's from first page. Here is my code (although I changed page = 5):
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen('https://tiki.vn/lam-sach-da-mat/c11232?sort=top_seller?page=5&page=5')
bs = BeautifulSoup(html, 'html.parser')
result =bs.find_all(lambda tag: tag.get('class') == ['product-item'])
Here is the result of 5th page in my code
I want to take product-id of 5th page as this
I want to get product-id of 5th page but don't understand why my code still show result of first page.
CodePudding user response:
It appears that, including advertising, there are 107 products. Here is a way of scraping the API endpoint directly and get all products:
import requests
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_colwidth', None)
headers = {'accept': 'application/json, text/plain, */*',
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
}
url = 'https://tiki.vn/api/personalish/v1/blocks/listings?limit=300&include=advertisement&aggregations=2&trackity_id=527749d7-0a68-f53e-54b5-fe2da48136f2&category=11232&page=1&sort=top_seller?page=5&urlKey=lam-sach-da-mat'
r = requests.get(url, headers=headers)
df = pd.json_normalize(r.json()['data'])
print(df)
Result:
id sku name url_key url_path type author_name book_cover brand_name short_description price list_price badges badges_new discount discount_rate rating_average review_count order_count favourite_count thumbnail_url thumbnail_width thumbnail_height freegift_items has_ebook inventory_status is_visible productset_id productset_group_name seller is_flower is_gift_card inventory url_attendant_input_form option_color stock_item salable_type seller_product_id installment_info url_review bundle_deal video_url tiki_live original_price shippable impression_info availability quantity_sold.text quantity_sold.value advertisement.ad advertisement quantity_sold
0 33606848 9815250596996 Kem tẩy da chết làm trắng sáng và đều màu da Paula’s Choice RESIST Daily Smoothing Treatment With 5% AHA 50 ml - 7660 dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848 dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848.html?spid=33606849 None Paula's Choice 849000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'official_store', 'icon': 'https://salt.tikicdn.com/ts/upload/5d/4c/f7/0261315e75127c2ff73efd7a1f1ffdf2.png', 'icon_height': 14, 'icon_width': 68, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'freegift_items', 'placement': 'under_rating', 'text': 'Quà tặng', 'type': 'under_rating_text'}, {'code': 'asa_reward_badge', 'icon': 'https://salt.tikicdn.com/ts/upload/d6/51/17/cde193f3d0f6da18147a739247c95c93.png', 'icon_height': 20, 'icon_width': 53, 'placement': 'bottom', 'type': 'asa_reward'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 144 ASA (48k ₫)<br/>≈ 5.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 100000 11 4.8 42 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/2e/43/38/ca1cbb77f9993e07db7ba3e107644d56.jpg 280 280 [] False available False 345 None False False None [] None 33606849 None False None None 949000 True [{'impression_id': 'thanos-product-VaFjRtzzGwSO09QA', 'metadata': {'price': 849000, 'rating_average': 4.8, 'reviews_count': 42, 'seller_product_id': 33606849}}, {'impression_id': '97c3dfe2-cf95-4161-94a3-529235d45ae1', 'metadata': {'advert_id': 3492748, 'business_id': 4769, 'flags': {'ad-2752': 5, 'p_cate': 8206, 'predictor': 'cb10', 'src': 'cat'}, 'product_id': 33606849, 'service_name': 'makesense', 'user_bucket': 570}}, {'impression_id': '99e7098f-6f87-4ee3-bd7a-7be637d2f402', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 207 207.0 [{'match_id': 0, 'advert_id': 0, 'business_id': 0, 'seller_id': 3946, 'clickUrl': '//tka.tiki.vn/pixel/pixel?data=djAwMUXcpZFK1GmJWEEH5wJs-tR0dD_QzrnUur3S0rkR7vFGb3Cx5LSWiqPQqZanWSaNuWGOindrBvik_aCq_9cjyQF8D906qQBOt-t08-ZoBweFhLNgyc5q11ZIVWlIHUQ19sfWQ8KQU_v-jzTMDXWv8osQqhXDUvwVkcKHHbvPz_q81AFAXFZp2IFbKnoZAYFibzaoW-UcoQiAnYZVtCWBvbQg2Qzx5TUVh6LJAgL0aMNk5tts6O8clarx2ICB8U95RnWeT6o8QjqNUl3NRakOED4nqSFEgtddT2Rci9Xqr-7vt_JEYULCGuKG2Oj7zqT-sAhWduFt3dkzhmsozBZvSURwk9vgVt1K4wvBf8wMX33iRyMCM1VIjd3PKGEV0QaEkQMGl_ulC_3fST17wZvrfdcFVqSPoGj98O63eir50lnrVNXYbpFlgDmIYMUqMs-rEkx_XvtSo76XIKoDmgn5GvLe2aewoNYvkI27vRCvV8Ufj7qhD9RAUXVFHv_DY5lVJRJ0j1vtnPYbnv8USOGUKu4RPRc93gxXukOuRxHq84a69M8zczLS25KVVfHnMmqbe3TZHvVg3zCtlc6tAiXiyJ0YhkxrhlRvEU32wpdX9Cv4M97rReqQJa7mZfFGxZ0rnWO58FSRO3Yt3xs_iAsMPHQ-0i8XTgFrh6BPaxS0xvM3EtgIjcjF63byLGz0NXcVj77whvoi2f9TFZMxy1O_Tte_Htf7TnrFNUCdo7xYVlIdymv3Jsfcy-YwW38uQ7Q9_4f9tcZGKF8BDuxaCPrKBZTp32HXOSd7zWsMX2wv3t0l4r4VjBaC2CZSSfvZNRlME4o0m-Q6YVNRb4Wk33DocSnphdBXztLhwpMaWSAFoErDNsZL9Qgqk4y-U8wb-UAV8BppXUKMpJkDwG8GxtGNUZ_PhZN4G1Jb4C-IdwLyeZxfwgcUV2LZ5k1D4WJ8lv707sAIHBCADaCxdRDJmjcX6-A7kDfpfT05W6tQzak5ElGkYC4YZxcr7TRW8EoJaV72glEkSBFdj1J5GNQqiajyh3XC88zQNSce-hI1keyZe-0qUJmZxcNEOUVoYZ7ifX-lz5jtWx_kbIYnZ_R3bvxf-FWyaNEOpXnh3s3iCluPd74Lmpea8AeLY2jGvqD77_cuvelXgbae4mP44E6uDCe_YOPF2ud3XgcssU_LoqxQrjj2x-nAYYy9tTIYPDsweEC89EKD0KciSFSB4UH5AQLzpTluPSFBGR9ki46I49xvOM3SS7fLaqcwyqWnnqMIDQ&CLICK&reqid=vFRZNdJMRr&pos=1&redirect=https://tiki.vn/dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848.html?spid=33606849', 'impUrl': '//tka.tiki.vn/pixel/pixel?data=djAwMR4kJOdkR6ThYEk7cciKp_1p0OpnJbH7SMwoWHj-Kjq2Cfp3Kr1qYggzrNrQE94SkWBR63dQO0Rj9g43Lhxrv8ggtrbKXBXQrAcmo32rEnvX_c4JiVgX_dPpAdubrRE_zagc7UdYpVqjJEiBQxE1_ioxavawDQ2SN12opCjx-yV3SuJOQyd9daAyxHl76CHW8acVYXE4wCHCeuLpW4YdhZwd4gXzMxPiyQyxGUVYLOJZBObM9md59_Ow96AddUaasyn0Yry3RUv5GZ_46O7u0eFENwZDlwEE2jrz6IYGHPOre4hWQmTtK1HSZWi6UmPDyy4qDjcw45nqIbs8hFmXBoCwCEa3oRLQ1sP8iVHJvYTN1eNQblQmPqWgVECEm3bd2kEMOd2H3qoVw__KqnBfD4G4avOZ5CnN-DFQTURnjUvqeyKJNuIAsvx8CLyZx25T6Ni6S1gL16_v4X3w3-0NbRHpqZrbJ1vFYZqdXba65MtDtsLz35yyWc3fo2iNTgvXMf5qkYGiAnMNoYaP6yv0YuvAozh0ekqDyS9qbEqDIwa7R87K4f_IDuKWwrhqfvC6gLAlPZk8M3vTOi1lxV15y5jI3WLsW5-sv0T7ypCmNnv56QMxfPMDJipD5ae9XFZWpBuQRoIHYOgASgWFTKrs85Escv0JjXkcLKkNoJM5KrPzNVxs5JeQv1qLvUXzyR0UYwnc3qVBlxIpbSoZOoK2bien691jxWED6osm94CVSVv_fw7yHW12fP9sh-Req2vGQvGq3D40ndF2ag1xpAdytgsJoHVxgoAPQS58TneDHE-GBdigYJSIjIKJ5c0Hxh7pvkajdnMHqvhxR44_Zo6LRgXvE_ZL048BwQOlz0gXJrzj8rVmwALMWC6N5R4_0bLjV4kUFEdg6LhdWC3Uu6CYsW5qyXcpfB2ampI2Q9Y3vOPgWKSa59e68kaUvZU_XHqFDotp2Kxc6fsQWH3f_e9z2NF6_jlLjN1N_xO-EODoEFEYGC2lxuCHo7h0qZLQO_TsIHo2eEfTqtEfHj7uWM1gHnlyf7mi2esr2Z2ow2Ul8NYYlSoMi_HHdynp-ogLWbva8_Zg-2h27X8&SHOW&reqid=vFRZNdJMRr&pos=1', 'trueImpUrl': '//tka.tiki.vn/pixel/pixel?data=djAwMcxZvpGC5q0kmKZdC0f7E7NTPgcQz3QIsrPU8HOdN3XcwRPLThqpL4IOS8wEXwiUOs06YO5ZboJEkr9lfAKAxN3tx0uDN278ihaJR75TE4VxdG2GeapxtNnUPEAY_92eVJfWBCwT2w4bGphxkBPuXbfRGo2viNSDHGPBz0EQbY94JAW8aYFV-O0_zl71Umd-6E5gd1-MlcnwMuV-VFQIKrjyTS1Udes_05EiM0KNl6glpvJPbN0dnL1SwCHHIWYZoxBNUfnfnS6aFqieWT-Zy_JirTZ3zWycjMX7gGaNkel3nUf4lQz4v7y6VHDkbfNtuKMJ5-OpzOF4A39-_ehIGtMohJjhx853kTRf3M5CO6_Fsw7LOKlKBPt6vioGblWSYiMF7fgO7qmrpv6Pv5DIaMEjJVE1qaGiKZmClN0xRLQS47bUFfrv7MTpAgCpj8YneFs0q7to23HiInMZpMSYBhFIXFIMk_RtA32f2pTeO399OG1_fV6J05VxACqfzfm_1zxWt1LSlUGME_Sb1F25uJP542ZPru8sgo6Q5Vy46A-zpRfk02MqXefuHEtw2cSRKxAzaK4yK7xyZiPK8cBarDgbPv8TuEpsT0bhu6x9lOB-fyvTZFAN8LS9uQ1nfeFn2icW0d472Q6w1TWtO5IC_fst1or5KHW9qKC_P4EDERumpobf7GAb-34ojWcAQtOWPHJpmqRGvLV0MR_ISNFwIkXmOW8Bgytfkuenox9_7Niqg9x3qEEdmA1Rr0R0KbEkiIkZRDS7ZsGeVIVjn1cTSaLbGwLNIVAIytKyNa6_BhYNvnjihOFjVweuIaNhvjmyWAMht-X9NkzNnWsQPNZy2ha3JABfCw4fkD4cOk8m16NH6avet0fG1cLGOYmKOneUWk7ihIWWRNFB0OoN2HQDsREfhQ2qEUWBrmDPHI-1Slt5MbcnS8cm0sF8mBu5LO4-RV-9i3e7VI0Ti7Cbcj5f6kD8QVkQSAijsSCDAwKA3oRPpjl7eeCu1qrmXCMW2VEBe0ftOFZSNg6PKQ-Pyvrxwkdpjf_yICGdOfGH_t0KNPi3DnKOlVrTKjg_osk&VIEW&reqid=vFRZNdJMRr&pos=1', 'properties': {'product_id': '33606849', 'matched_query': 0, 'image': '', 'url': ''}, 'type': 'ProductAdvertType', 'impression_info': [], 'image_ratio': 0}] NaN NaN
1 33606786 5722576750824 Lotion tẩy tế bào chết làm sáng da Paula’s Choice Skin Perfecting 8% AHA Lotion 100ml 2060 lotion-tay-te-bao-chet-lam-sang-da-paula-s-choice-skin-perfecting-8-aha-lotion-100ml-2060-p33606786 lotion-tay-te-bao-chet-lam-sang-da-paula-s-choice-skin-perfecting-8-aha-lotion-100ml-2060-p33606786.html?spid=66640131 None Paula's Choice 663000 0 [] [{'code': 'tikinow', 'icon': 'https://salt.tikicdn.com/ts/upload/3f/76/87/4c636b7bea11521f46f733b7839df4de.png', 'icon_height': 16, 'icon_width': 32, 'placement': 'delivery_info', 'text': 'Giao siêu tốc 2H', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 73 ASA (24k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 4.7 20 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/2f/6d/8d/081edbe77b16439c4fa0b18263cbede7.jpg 280 280 [] False available False 345 None False False None [] None 66640131 None False None None 663000 True [{'impression_id': 'thanos-product-QvglqBASVoDvOhFS', 'metadata': {'price': 663000, 'rating_average': 4.7, 'reviews_count': 20, 'seller_product_id': 66640131}}, {'impression_id': '75f5fa8b-3b32-4280-86a4-141778a1cb1f', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 36 36.0 NaN NaN NaN
2 11239286 9792297299199 Gel tẩy da chết Arrahan Lemon White Peeling Gel (180ml) gel-tay-da-chet-arrahan-lemon-white-peeling-gel-180ml-p11239286 gel-tay-da-chet-arrahan-lemon-white-peeling-gel-180ml-p11239286.html?spid=20116852 None Arrahan 61900 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 7 ASA (2k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 4.7 77 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/93/cb/da/afd6b13fe3654bf4351b260b801c41e3.jpg 280 280 [] False available False 345 None False False None [] None 20116852 None False None None 61900 True [{'impression_id': 'thanos-product-UDr0lE1YpujdRftZ', 'metadata': {'price': 61900, 'rating_average': 4.7, 'reviews_count': 77, 'seller_product_id': 20116852}}, {'impression_id': 'e81a5e80-1c98-4d19-abdc-a60250309814', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 539 539.0 NaN NaN NaN
3 33606848 8573828662870 Kem tẩy da chết làm trắng sáng và đều màu da Paula’s Choice RESIST Daily Smoothing Treatment With 5% AHA 50 ml - 7660 dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848 dung-dich-loai-bo-te-bao-chet-lam-mem-da-paula-s-choice-resist-daily-smoothing-treatment-with-5-aha-50-ml-p33606848.html?spid=66638723 None Paula's Choice 529000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 58 ASA (19k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 4.7 7 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/f8/10/ef/714f6b435ade504ce920caeff4ace16f.jpg 280 280 [] False available False 345 None False False None [] None 66638723 None False None None 529000 True [{'impression_id': 'thanos-product-zTwvu1Q7UONamIJN', 'metadata': {'price': 529000, 'rating_average': 4.7, 'reviews_count': 7, 'seller_product_id': 66638723}}, {'impression_id': '95894dfd-049c-41e4-848a-e179e8a5c03b', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 27 27.0 NaN NaN NaN
4 20525156 3751926198377 Tẩy Tế Bào Chết 3W Clinic Collagen Crystal Peeling Gel 180ml tay-te-bao-chet-3w-clinic-collagen-crystal-peeling-gel-180ml-p20525156 tay-te-bao-chet-3w-clinic-collagen-crystal-peeling-gel-180ml-p20525156.html?spid=20525157 None 3W Clinic 119000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 3 ASA (981 ₫)<br/>≈ 0.8% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 31000 21 4.7 12 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/9a/01/46/71fa72df01b8addc69770f67b3bcedab.jpg 280 280 [] False available False 345 None False False None [] None 20525157 None False None None 150000 True [{'impression_id': 'thanos-product-avJJKrffpf79iq8i', 'metadata': {'price': 119000, 'rating_average': 4.7, 'reviews_count': 12, 'seller_product_id': 20525157}}, {'impression_id': 'c9e9e842-8065-4d9e-8269-0d3b466e311b', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 51 51.0 NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
102 4701145 6917512701766 Kem Tẩy tế bào chết cho mặt Byphasse Exfoliant Face Scrub Dành cho mọi loại da kem-tay-te-bao-chet-cho-mat-byphasse-exfoliant-face-scrub-danh-cho-moi-loai-da-p4701145 kem-tay-te-bao-chet-cho-mat-byphasse-exfoliant-face-scrub-danh-cho-moi-loai-da-p4701145.html?spid=27924960 None Byphasse 119000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_badge', 'icon': 'https://salt.tikicdn.com/ts/upload/d6/51/17/cde193f3d0f6da18147a739247c95c93.png', 'icon_height': 20, 'icon_width': 53, 'placement': 'bottom', 'type': 'asa_reward'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 20 ASA (7k ₫)<br/>≈ 5.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 4.0 8 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/a1/7c/77/7acfba66ad481b870be5fdb1d10a4662.jpg 280 280 [] False available False 345 None False False None [] None 27924960 None False None None 119000 True [{'impression_id': 'thanos-product-Khf0Kz3w7kEtxJ1U', 'metadata': {'price': 119000, 'rating_average': 4, 'reviews_count': 8, 'seller_product_id': 27924960}}, {'impression_id': 'cabf20ec-d814-48ba-b000-7126ed1a22d5', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 49 49.0 NaN NaN NaN
103 38465349 8446201287222 Gel Tẩy Tế Bào Chết Keana Baking Soda Moist Peeling (120G) - HÀNG CHÍNH HÃNG gel-tay-te-bao-chet-keana-baking-soda-moist-peeling-120g-hang-chinh-hang-p38465349 gel-tay-te-bao-chet-keana-baking-soda-moist-peeling-120g-hang-chinh-hang-p38465349.html?spid=38465350 None Keana 421200 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_badge', 'icon': 'https://salt.tikicdn.com/ts/upload/d6/51/17/cde193f3d0f6da18147a739247c95c93.png', 'icon_height': 20, 'icon_width': 53, 'placement': 'bottom', 'type': 'asa_reward'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 72 ASA (24k ₫)<br/>≈ 5.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 118800 22 4.5 2 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/7e/a0/59/8dd959d52b59306d83523204062ad713.jpg 280 280 [] False available False 345 None False False None [] None 38465350 None False None None 540000 True [{'impression_id': 'thanos-product-Rcuvm8ucH1kfYALI', 'metadata': {'price': 421200, 'rating_average': 4.5, 'reviews_count': 2, 'seller_product_id': 38465350}}, {'impression_id': '3281c96d-3a6c-4b26-838b-4b50e9f9a618', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 3 3.0 NaN NaN NaN
104 15213464 4407455438680 Tẩy bào chết Belif Mild And Effective Facial Scrub 100ml tay-bao-chet-belif-mild-and-effective-facial-scrub-100ml-p15213464 tay-bao-chet-belif-mild-and-effective-facial-scrub-100ml-p15213464.html?spid=76083479 None Belif 630000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 16 ASA (5k ₫)<br/>≈ 0.8% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 0.0 0 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/2a/cf/b8/fb265c0ce6944bbb6aa822eca1642be3.png 280 280 [] False available False 345 None False False None [] None 76083479 None False None None 630000 True [{'impression_id': 'thanos-product-v8EPXz3gtAHsbWNz', 'metadata': {'price': 630000, 'rating_average': 0, 'reviews_count': 0, 'seller_product_id': 76083479}}, {'impression_id': 'f801f336-5090-46f9-ba75-41a7e22a0dc2', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 1 1.0 NaN NaN NaN
105 51088975 9244203860400 Dấm táo The Inkey List Apple Cider Vinegar Acid Peel 30ml dam-tao-the-inkey-list-apper-cider-vinegar-acid-peel-30ml-p51088975 dam-tao-the-inkey-list-apper-cider-vinegar-acid-peel-30ml-p51088975.html?spid=51088976 None The Inkey List 589000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'trusted_store', 'icon': 'https://salt.tikicdn.com/ts/upload/e8/6a/e3/7f998ef1eb5ab0536aac53f02a698c8a.png', 'icon_height': 14, 'icon_width': 54, 'placement': 'top', 'type': 'icon_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 65 ASA (21k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 0 0 5.0 1 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/87/6f/86/0bae14bd8ebd26ae57a95f8bb47de9da.png 280 280 [] False available False 345 None False False None [] None 51088976 None False None None 589000 True [{'impression_id': 'thanos-product-RKuBQgdldtq0QZut', 'metadata': {'price': 589000, 'rating_average': 5, 'reviews_count': 1, 'seller_product_id': 51088976}}, {'impression_id': '4a762b28-db43-4da1-a039-6bf01d078413', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 Đã bán 3 3.0 NaN NaN NaN
106 24408456 5696255831404 Gel Giúp Loại Bỏ Tế Bào Chết IASO gel-giup-loai-bo-te-bao-chet-p24408456 gel-giup-loai-bo-te-bao-chet-p24408456.html?spid=24408458 None IASO 441000 0 [] [{'code': 'delivery_info_badge', 'placement': 'delivery_info', 'text': 'Giao tiết kiệm', 'type': 'delivery_info_badge'}, {'code': 'freeship_plus', 'placement': 'under_rating', 'text': 'Freeship ', 'type': 'under_rating_text'}, {'code': 'asa_reward_html_badge', 'placement': 'under_price', 'text': 'Tặng tới 49 ASA (16k ₫)<br/>≈ 3.6% hoàn tiền', 'text_color': '#808089', 'type': 'asa_reward_html'}] 49000 10 0.0 0 0 0 https://salt.tikicdn.com/cache/280x280/ts/product/78/29/34/b52258f69bfe3349bfe9c55a7dd9095c.jpg 280 280 [] False available False 345 None False False None [] None 24408458 None False None None 490000 True [{'impression_id': 'thanos-product-mSLaaOhB4aLTtZa7', 'metadata': {'price': 441000, 'rating_average': 0, 'reviews_count': 0, 'seller_product_id': 24408458}}, {'impression_id': '4fd674dd-ee1d-459a-9c0d-1cf00d99662b', 'metadata': {'product_id': 345, 'service_name': 'reco', 'version': 'p_category_mpid_listing_v1_202211190600'}}] 1 NaN NaN NaN NaN NaN
CodePudding user response:
Btw, you can specifically check the actual page number of the html with something like soup.select_one('li > a.current[data-view-id="product_list_pagination_item"][data-view-label]').get('data-view-label')
.
Explanation: No matter which page you use the link for, it's always the first page that is loaded first, and then the page is updated dynamically (with JavaScript and APIs). You can see this by going to the network tab on devtools [ you might have to refresh page after opening, and make sure that the "preserve log" option is not checked ] and clicking on [the name of] the first request in the log [it should end the same as the link in the address bar] ; the html in the "Response" is what is fetched by requests.get
- you might notice that this html is of the first page.
If you scroll through the other requests in the log, you should find one to
https://tiki.vn/api/personalish/v1/blocks/listings?limit=40&include=advertisement&aggregations=2&trackity_id=3dddf2b8-1eb2-e891-0cdf-c23b37663c28&category=11232&page=5&sort=top_seller?page=5&urlKey=lam-sach-da-mat
and the products are probably loaded from this. All of the params PPEr to be fixed, or can be found in the page url, except for trackity_id
; if you look at the request initiator chain, you can see which JavaScript file made the request and you could try to figure out how trackity_id
is generated; but personally, I'd find it easier to just use selenium.
Suggested Solution 1: It appears that you can actually use the API with only the params we already know (category
, urlKey
, sort
):
# import cloudscraper
r = cloudscraper.create_scraper().get('https://tiki.vn/api/personalish/v1/blocks/listings?limit=300&category=11232&sort=top_seller?page=5&urlKey=lam-sach-da-mat')
productList = r.json()['data']
print('### [{id}_{sku}: {name}] for first 10 products of', f'{len(productList)} ###\n')
for p in productList[:10]: print(f"{p['id']}_{p['sku']}: {p['name']}")
(I used cloudscraper
because I'm not very familiar with urlopen
, and also I'm not good at setting the right headers with requests
to avoid 403 errors....) This prints
### [{id}_{sku}: {name}] for first 10 products of 100 ###
33606786_5722576750824: Lotion tẩy tế bào chết làm sáng da Paula’s Choice Skin Perfecting 8% AHA Lotion 100ml 2060
11239286_9792297299199: Gel tẩy da chết Arrahan Lemon White Peeling Gel (180ml)
33606848_8573828662870: Kem tẩy da chết làm trắng sáng và đều màu da Paula’s Choice RESIST Daily Smoothing Treatment With 5% AHA 50 ml - 7660
20525156_3751926198377: Tẩy Tế Bào Chết 3W Clinic Collagen Crystal Peeling Gel 180ml
67089667_9204550497315: Combo 2 chai tiện lợi - Natureine AQUA PEEL Moisture Peeling Gel - Gel tẩy tế bào da chết, cấp ẩm Nhật Bản - Chính Hãng
21481823_9335684703529: Gel tẩy tế bào chết sáng da hồng sâm Hàn Quốc My Gold Korea Red Ginseng Peeling Gel (130ml) – Hàng Chính Hãng
46203526_8584500833846: Bông Tẩy Da Chết Cosrx One-Step Original Clear Pad 70 Sheets (New 2019)
1941543_2999847759227: Kem tẩy tế bào chết mặt Organic Shop Organic Coffee & Powder 75ml
57783000_9733773668061: Natureine AQUA PEEL Moisture Peeling Gel - Gel tẩy tế bào da chết, cấp ẩm Nhật Bản - Chính Hãng
7319657_7325473003642: Trial Tinh chất dành cho da mụn cao cấp Resist BHA 9 0.83 ml
However, I feel like there should have been more than just 100 products - paginating with selenium (below) indicated that there should be 177 products.
Suggested Solution 2: You can loop through the pages using this function that I wrote to get and parse html (with selenium bs4)
maxPages = 10 # or as you prefer
nextUrl = 'https://tiki.vn/lam-sach-da-mat/c11232?sort=top_seller'
pgi_sel = 'data-view-id="product_list_pagination_item"'
for pn in range(1, maxPages 1):
curPage_xpath = f'//li/a[@][@{pgi_sel}][@data-view-label="{pn}"]'
soup = linkToSoup_selenium(nextUrl, ecx=curPage_xpath)
if soup is None or type(soup) == str: break
###################### EXTRACT DATA ######################
# this is just printing the page# and 1st five IDs, but you can extract whatever you need from soup at this point
curPg = soup.select_one(f'li > a.current[{pgi_sel}][data-view-label]')
curPg = f'page {curPg.get("data-view-label")}' if curPg else '!! page ERROR !!'
pageProds = soup.select('a.product-item[href*=".html?spid="]')
curPg = f" [{len(pageProds)} products]:"
first5ids = [a.get('href').split('.html?spid=')[-1] for a in pageProds][:5]
print(f'{curPg:>22} ', " ".join([f'{i:>10}' for i in first5ids]), '...')
##########################################################
nxtPg = soup.select_one(f'li > a[{pgi_sel}][href]:has(img[alt="arrow-right"])')
if nxtPg is None or 'disabled' in nxtPg.get('class', ''): break
nextUrl = nxtPg.get('href')
and that printed
page 1 [40 products]: 66640131 20116852 66638723 20525157 67089668 ...
page 2 [40 products]: 63465592 20911921 54388844 58555745 13385021 ...
page 3 [40 products]: 1515345 57703788 1060978 54929902 2076819 ...
page 4 [40 products]: 35737314 26299382 7029351 14970693 32139853 ...
page 5 [11 products]: 52274203 51988147 50422842 36828505 45439018 ...
(If you don't want to limit to maxPages
, you can just use something like while True
instead of for pn in range(maxPages)
, but then you'll also need to use a counter or something to get pn
for ecx
, since that's what tells the function to wait until that part of the html is loaded.)