I want to cleanup the page_url
field under pandas dataframe df
for example
df
:
page_url |
---|
https://www.google.com/ |
Our goal is to clean it up like below:
page_url |
---|
https://www.google.com/ |
I've tried:
df['page_url'].str.strip().replace(dict(zip(["/", ":"], ["/", ":"])),regex=True)
It works for this example, however the dataframe page_url
column has other values like '+' or other strings, just want to see if there is an alternative way to do that in Python 3 instead of writing down each string needs to be replace. Thanks
CodePudding user response:
import urllib.parse
urllib.parse.unquote("https://www.google.com/")
# 'https://www.google.com/'
So what we need is
df['page_url'].apply(urllib.parse.unquote)