importing files with Chinese characters to R

I should post this on StackOverFlow..
Here’s some explanation on ASCII and UTF encoding. https://stackoverflow.com/questions…
My problem:
our questionnaire has some chinese characters, as you can imagine, all the location name is in Chinese, people’s name, seed variety, herbicide and pesticide etc, as well as the strategies they use. In excel, these characters can display fine, and so was the first time I processed them in R (now think about it, could be that I was using my mac instead of lenovo).
Anyway, the challenge is that I need to process the data in R, and I’ve exported all the excel files into CSV, with UTF-8 encoding option. However, no matter how many times/different functions I called in R, different settings in R (I’ve tried Windows default encoding, which is windows-1252, and ISO, or the different read.table, readr), the Chinese characters are still not showing properly.
Some of the tips I’ve found helpful to understand the possible errors:
Although these tips didn’t solve my problem. It’s not about choosing UTF-8 or fileEncoding or encoding. At the end, it is fucking excel’s problem, because I found this one and it helped a lot: https://stackoverflow.com/questions…
Turns out even you selected encoding UTF-8 when you converting xlsx to CSV, excel doesn’t do it..So you can either convert the data to txt, or using google spreadsheet and download it as a CSV. Also, you have to cancel the automatically open downloaded file this option, because once excel opens the CSV, the encoding is messed up again.
Another thing to pay attention is that, at Import Dataset, when you use Data Viewer, it may not show the proper encoding results. My data could be imported properly if I finished the importing process rather than just look at the Data Viewer.
I also find UTF-8 is different from UTF8 (however, I can’t prove which one is correct), but I think my main problem here is caused by excel.

Leave a Reply

Your email address will not be published. Required fields are marked *