Good big!
Younger brother new here, if you have posted wrong place, also please great burke,
(this partition is also blind choice, myself, I don't know this problem can be solve)
I met a problem, when the crawler is to read the data with the code
A compound, for example, in the website shows the Butenolide a... ? , but the actual (by comparison with other sites to find) because it is Butenolide Ⅱ
Later, I found, Ⅱ if using the method of encoding=latin-1 read, is a... ?
But the site itself encoding is utf-8
That is to say the web site at the time of building, will read data in latin-1 way, and then use utf-8 preserved the
Because there are many similar noise problem, there is no other way to batch textual substitution, excuse me what method can be converted into the first look like?
Current status:
Data format: through the crawler is saved as a CSV format
Language: I will only c #, PYTHON, R
Database: only know fur
Thank you very much