Home > database >  About latin-1 and utf-8 transformation problem
About latin-1 and utf-8 transformation problem

Time:10-18

Good big!
Younger brother new here, if you have posted wrong place, also please great burke,
(this partition is also blind choice, myself, I don't know this problem can be solve)

I met a problem, when the crawler is to read the data with the code
A compound, for example, in the website shows the Butenolide a... ? , but the actual (by comparison with other sites to find) because it is Butenolide Ⅱ
Later, I found, if using the method of encoding=latin-1 read, is a... ?
But the site itself encoding is utf-8

That is to say the web site at the time of building, will read data in latin-1 way, and then use utf-8 preserved the

Because there are many similar noise problem, there is no other way to batch textual substitution, excuse me what method can be converted into the first look like?

Current status:
Data format: through the crawler is saved as a CSV format
Language: I will only c #, PYTHON, R
Database: only know fur

Thank you very much

  • Related