We would like to manage the columns from csv
file with originally three character
columns class when we used the fread()
with the arguments detailed in the R code used
section (separator and decimal arguments are specified). The R session version is 4.2.0
and the data.table
version is 1.14.2
.
Input data from csv
file
col_1,col_2, col_3
/100.432,"30,84Â %","4,14"
/3.200,"62,89Â %","1,89"
/10.100,"50,00Â %","1,62"
/15.570, "40,10Â %","3,41"
/900.310, "8,00Â %","0,10"
Input data in R
session
> dat
# A tibble: 5 × 3
col_1 col_2 col_3
<chr> <chr> <chr>
1 100.432 30,84 % 4,14
2 3.200 62,89 % 1,89
3 10.100 50,00 % 1,62
4 15.570 40,10 % 3,41
5 900.310 8,00 % 0,10
R code used
data.table::fread(
x,
sep = ',',
dec = '.',
na.strings = c('', 'NA')) %>%
as_tibble()
Desired output data
> dat
# A tibble: 5 × 3
col_1 col_2 col_3
<dbl> <dbl> <dbl>
1 100438 30.84 4.14
2 3200 62.89 1.89
3 10100 50.00 1.62
4 15570 40.10 3.41
5 900310 8.00 0.10
Question
We would like to obtain the Desired output data
format.
Thanks in advance
CodePudding user response:
You could just do some postprocessing in R:
RAW = fread('col_1,col_2, col_3
/100.432,"30,84Â %","4,14"
/3.200,"62,89Â %","1,89"
/10.100,"50,00Â %","1,62"
/15.570, "40,10Â %","3,41"
/900.310, "8,00Â %","0,10"')
# col_1 col_2 col_3
# <char> <char> <char>
# 1: /100.432 30,84Â % 4,14
# 2: /3.200 62,89Â % 1,89
# 3: /10.100 50,00Â % 1,62
# 4: /15.570 40,10Â % 3,41
# 5: /900.310 8,00Â % 0,10
library(readr)
RAW[, lapply(.SD, \(x) parse_number(x, locale = locale(decimal_mark = ",")))]
# Or in base R:
RAW[, lapply(.SD, \(x) gsub("[^0-9.]", "", chartr(".,", "_.", x)) |> as.numeric())]
# col_1 col_2 col_3
# <num> <num> <num>
# 1: 100432 30.84 4.14
# 2: 3200 62.89 1.89
# 3: 10100 50.00 1.62
# 4: 15570 40.10 3.41
# 5: 900310 8.00 0.10