Home > Mobile >  UTF8 encoding with czech characters
UTF8 encoding with czech characters

Time:09-13

I created R project, but I struggle with some czech characters. For example

print("Příliš žluťoučký kůň ů úpěl ďábelské ódy")

returns

[1] "Príliš žlutoucký kun u úpel dábelské ódy"

Although some characters were properly printed, most of them not. Also when I create dataframe with this string, same characters are messed up.

View(data.frame("Příliš žluťoučký kůň ů úpěl ďábelské ódy"))

CodePudding user response:

Looks like your locale is not set to UTF8. You can change this with Sys.setlocale.

Sys.setlocale("LC_ALL", "C")
print("Příliš žluťoučký kůň ů úpěl ďábelské ódy")
#[1] "P\305\231\303\255li\305\241 \305\276lu\305\245ou\304\215k\303\275 k\305\257\305\210 \305\257 \303\272p\304\233l \304\217\303\241belsk\303\251 \303\263dy"

Sys.setlocale("LC_ALL", "de_DE.UTF-8")  # Linux, macOS, other Unix-alikes
print("Příliš žluťoučký kůň ů úpěl ďábelské ódy")
#[1] "Příliš žluťoučký kůň ů úpěl ďábelské ódy"

Sys.setlocale("LC_ALL", "de")     # Solaris: details are OS-dependent
Sys.setlocale("LC_ALL", "de_DE")  # Many Unix-alikes
Sys.setlocale("LC_ALL", "de_DE.UTF-8")  # Linux, macOS, other Unix-alikes
Sys.setlocale("LC_ALL", "de_DE.utf8")   # some Linux versions
Sys.setlocale("LC_ALL", "German.UTF-8") # Windows

CodePudding user response:

This is an R bug that's fixed in the latest version, 4.2.1. I was able to reproduce the problem on Windows in the RStudio terminal with 4.1.3.

Upgrading to 4.2.1 fixed the problem :

R version 4.2.1 (2022-06-23 ucrt) -- "Funny-Looking Kid"
Copyright (C) 2022 The R Foundation for Statistical Computing
Platform: x86_64-w64-mingw32/x64 (64-bit)
...
> print("Příliš žluťoučký kůň ů úpěl ďábelské ódy")
[1] "Příliš žluťoučký kůň ů úpěl ďábelské ódy"

There was no need to change any locale or codepage settings.

It seems that 4.1.3 tries to change terminal settings to "fix" codepage issues. In the same terminal window, running R 4.1.3 causes pasting to cut diacritics. I can't even paste the original text.

In the same terminal, running R 4.2.1 after 4.1.3 works just fine. At the very least this means that 4.1.3 was modifying console settings.

  •  Tags:  
  • r
  • Related