I have a MongoDB NoSQL database, the name is baike
, there is a collection named baike_items
with the following format:
id:
title:
baike_id
page_url
text
All other fields are fine except the page_url
. Some of the urls are normal like:
'https://baike.baidu.hk/item/契丹族/2390374'
But some urls are ended with a string #viewPageContent
, like:
https://baike.baidu.hk/item/