python, regex and mongoDB. a field named 'page_url' storing a single url like
https://baike.baidu.hk/item/黃金分割率/24137816
or
https://baike.baidu.hk/item/物理光學/61334055#viewPageContent
I want to do a whole document replace that remove the #viewPageContent
of all the urls with it.
Thanks.
CodePudding user response:
old_url = "https://baike.baidu.hk/item/物理光學/61334055#viewPageContent"
new_url = old_url.replace("#viewPageContent", "")
print(old_url)
>>> https://baike.baidu.hk/item/物理光學/61334055#viewPageContent
print(new_url)
>>> https://baike.baidu.hk/item/物理光學/61334055
CodePudding user response:
import re
a = "https://baike.baidu.hk/item/物理光學/61334055#viewPageContent"
print(re.sub(r"#viewPageContent", '', a))
output: https://baike.baidu.hk/item/物理光學/61334055
Hope I could help you!
CodePudding user response:
db.baike_items.update_many(
{ "page_url": { "$regex": "#viewPageContent"} },
[{
"$set": { "page_url": {
"$replaceOne": { "input": "$page_url", "find": "#viewPageContent", "replacement": "" }
}}
}]
)