Home > Software engineering >  How can I fix output showing xa017?
How can I fix output showing xa017?

Time:05-26

I grab some data from the web and they all look good. However,once I tried to handle the data and make some operations on their string. The final output showed that some characters become Unicode code. How can I fix it?

import requests
from bs4 import BeautifulSoup

r = requests.get('https://www.fed.cuhk.edu.hk/cri/faculty/prof-lee-kit-bing-icy/')
soup = BeautifulSoup(r.text)

ref= soup.select('h5:-soup-contains("Selected Publications") ~ ol:nth-of-type(1) li')[-1]
publication_dict= {}

#journal page and periodal
if ref.text[ref.text.find(ref.em.text) len(ref.em.text) 2:-1] == "":
    publication_dict['remamin_information'] = None

else:
    if periodical != None:
        publication_dict['remamin_information'] = (periodical ref.text[ref.text.find(ref.em.text) len(ref.em.text):-1])
    else:
        publication_dict['remamin_information'] = (ref.text[ref.text.find(ref.em.text) len(ref.em.text):-1])

publication_dict

1

CodePudding user response:

When you print a list or dict, Python uses a debug representation for display of the elements to help identify unprintable characters. If you actually print the string, you'll see the display representation:

>>> d = {'remamin_information':',\xa017(2), 69-85.\r\n '}
>>> d     # display the dict.  Elements use debug representation.
>>> d['remamin_information']  # The REPL uses a debug representation
',\xa017(2), 69-85.\r\n '
>>> print(d['remamin_information'])   # the \xa0 is actually a NO-BREAK SPACE
, 17(2), 69-85.                       # and the \r\n becomes a line break

There's nothing to "convert back to normal". Just make sure to print() strings to see their display representation.

  • Related