Home > Software engineering >  How can i convert logical columns to numeric in R using dplyr?
How can i convert logical columns to numeric in R using dplyr?

Time:11-16

i have a data frame that i import it from an excel (.xlsx) file and looks like this :

>data

# A tibble: 3,338 x 4
   Dates               A     B     C    
   <dttm>              <lgl> <lgl> <lgl>
 1 2009-01-05 00:00:00 NA    NA    NA   
 2 2009-01-06 00:00:00 NA    NA    NA   
 3 2009-01-07 00:00:00 NA    NA    NA   
 4 2009-01-08 00:00:00 NA    NA    NA   
 5 2009-01-09 00:00:00 NA    NA    NA   
 6 2009-01-12 00:00:00 NA    NA    NA   
 7 2009-01-13 00:00:00 NA    NA    NA   
 8 2009-01-14 00:00:00 NA    NA    NA   
 9 2009-01-15 00:00:00 NA    NA    NA   
10 2009-01-16 00:00:00 NA    NA    NA   
# ... with 3,328 more rows
# i Use `print(n = ...)` to see more rows


The problem is that these three columns A,B,C contain numeric values but after some 3 thousand rows.

but trying to convert them into numeric i did :

data%>%
  dplyr::mutate(date = as.Date(Dates))%>%
  dplyr::select(-Dates)%>%
  dplyr::relocate(date,.before="A")%>%
  dplyr::mutate_if(is.logical, as.numeric)%>%
  tidyr::pivot_longer(!date,names_to = "var", values_to = "y")%>%
  dplyr::group_by(var)%>%
  dplyr::arrange(var)%>%
  tidyr::drop_na()


but the problem remains :

   date       var       y
   <date>     <chr> <dbl>
 1 2021-11-30 A         1
 2 2021-12-01 A         1
 3 2021-12-02 A         1
 4 2021-12-03 A         1
 5 2021-12-06 A         1
 6 2021-12-07 A         1
 7 2021-12-08 A         1
 8 2021-12-09 A         1
 9 2021-12-10 A         1
10 2021-12-13 A         1
# ... with 189 more rows

any help ?

CodePudding user response:

Summing up from comments:
it's usually easier to fix conversation errors closer to original source as possible.

read_xlsx tries to guess column types by checking first guess_max rows, guess_max being a read_xlsx parameter with a default value of min(1000, n_max). If read_xlsx gets columns types wrong because those cols are filled with NA for the first 1000 rows, just increasing guess_max parameter might be a viable solution:

readxl::read_xlsx("file.xlsx", guess_max = 5000)

Though for a simple 4-column dataset one shouldn't need more than defining correct column types manually:

readxl::read_xlsx("file.xlsx", col_types = c("date", "numeric", "numeric", "numeric"))

If NA values are only at the beginning of some columns, changing sorting order in Excel itself and moving NAs from top before importing the file into R should also work.

  • Related