Home > front end >  Issues with accent when using the "separate" function from tidyverse
Issues with accent when using the "separate" function from tidyverse

Time:05-06

I am using the separate function from tidyverse to split the first column of this tibble :

# A tibble: 6,951 x 9
   Row.names                    Number_of_analysis~ DL_Minimum DL_Mean DL_Maximum Number_of_measur~ Measure_Minimum Measure_Mean Measure_Maximum
   <I<chr>>                                   <dbl>      <dbl>   <dbl>      <dbl>             <dbl>           <dbl>        <dbl>           <dbl>
 1 2011.FACILITY.PONT-À-CELLES                  52       0.6    1.81        16                   0             0          0                 0  
 2 2011.FACILITY.PONT-À-CELLES                  52       0.07   0.177        1.3                 0             0          0                 0  
 3 2011.FACILITY.CHARLEROI                      52       0.07   0.212        1.9                 0             0          0                 0  
 4 2011.FACILITY.CHARLEROI                      52       0.08   0.209        2                   0             0          0                 0  
Merge_splitnames <- Merge %>% 
  separate(Row.names,sep = "\\.",into = c("Year", "Catchment", "Locality"), extra = "drop")

While everything seems correct, the output is a tibble without the first 2 columns (the ones which have a name comprising an accent in French) :

# A tibble: 6,951 x 9
   Year    Catchment    Locality                    Number_of_analysis~ DL_Minimum DL_Mean DL_Maximum Number_of_measur~ Measure_Minimum Measure_Mean Measure_Maximum
   <I<chr>>                                   <dbl>      <dbl>   <dbl>      <dbl>             <dbl>           <dbl>        <dbl>           <dbl>
 3 2011    FACILITY     CHARLEROI                      52       0.07   0.212        1.9                 0             0          0                 0  
 4 2011    FACILITY     CHARLEROI                      52       0.08   0.209        2                   0             0          0                 0  

Any idea how to deal with this issue ? I wish to keep the real name in French (with the accent). This is quite surprising for me, I've never got any issue with all the other functions from tidyverse.

NB : this is a simple and reproducible example, my real tibble is about 100 times bigger

CodePudding user response:

separate is retaining the accent for me:

library(tidyverse)

tribble(
  ~names,
  "2011.FACILITY.PONT-À-CELLES",
  "2011.FACILITY.PONT-À-CELLES",
  "2011.FACILITY.CHARLEROI",
  "2011.FACILITY.CHARLEROI"
)  %>%
  separate(names, sep = "\\.", into = c("Year", "Catchment", "Locality"))
#> # A tibble: 4 × 3
#>   Year  Catchment Locality     
#>   <chr> <chr>     <chr>        
#> 1 2011  FACILITY  PONT-À-CELLES
#> 2 2011  FACILITY  PONT-À-CELLES
#> 3 2011  FACILITY  CHARLEROI    
#> 4 2011  FACILITY  CHARLEROI

Created on 2022-05-06 by the reprex package (v2.0.1)

CodePudding user response:

Assuming DF shown reproducibly in the Note at the end, use extra = "merge" in separate . (It is possible that you may need to change your locale but I did not need to do that -- Sys.getlocale() .)

library(tidyr)

DF  %>%
  separate(Row.names, c("Year", "Catchment", "Locality"), extra = "merge")

giving:

  Year Catchment      Locality Number_of_analysis~ DL_Minimum DL_Mean
1 2011  FACILITY PONT-À-CELLES                  52       0.60   1.810
2 2011  FACILITY PONT-À-CELLES                  52       0.07   0.177
3 2011  FACILITY     CHARLEROI                  52       0.07   0.212
4 2011  FACILITY     CHARLEROI                  52       0.08   0.209
  DL_Maximum Number_of_measur~ Measure_Minimum Measure_Mean Measure_Maximum
1       16.0                 0               0            0               0
2        1.3                 0               0            0               0
3        1.9                 0               0            0               0
4        2.0                 0               0            0               0

Note

DF <- 
structure(list(Row.names = c("2011.FACILITY.PONT-À-CELLES", "2011.FACILITY.PONT-À-CELLES", 
"2011.FACILITY.CHARLEROI", "2011.FACILITY.CHARLEROI"), `Number_of_analysis~` = c(52L, 
52L, 52L, 52L), DL_Minimum = c(0.6, 0.07, 0.07, 0.08), DL_Mean = c(1.81, 
0.177, 0.212, 0.209), DL_Maximum = c(16, 1.3, 1.9, 2), `Number_of_measur~` = c(0L, 
0L, 0L, 0L), Measure_Minimum = c(0L, 0L, 0L, 0L), Measure_Mean = c(0L, 
0L, 0L, 0L), Measure_Maximum = c(0L, 0L, 0L, 0L)), class = "data.frame", row.names = c("1", 
"2", "3", "4"))
  • Related