Home > Blockchain >  Python encode URL for pandas dataframe
Python encode URL for pandas dataframe

Time:06-21

I want to cleanup the page_url field under pandas dataframe df for example

df:

page_url
https://www.google.com/

Our goal is to clean it up like below:

page_url
https://www.google.com/

I've tried: df['page_url'].str.strip().replace(dict(zip(["/", ":"], ["/", ":"])),regex=True)

It works for this example, however the dataframe page_url column has other values like '+' or other strings, just want to see if there is an alternative way to do that in Python 3 instead of writing down each string needs to be replace. Thanks

CodePudding user response:

import urllib.parse
urllib.parse.unquote("https://www.google.com/")
# 'https://www.google.com/'

So what we need is

df['page_url'].apply(urllib.parse.unquote)
  • Related