i have a data frame that i import it from an excel (.xlsx) file and looks like this :
>data
# A tibble: 3,338 x 4
Dates A B C
<dttm> <lgl> <lgl> <lgl>
1 2009-01-05 00:00:00 NA NA NA
2 2009-01-06 00:00:00 NA NA NA
3 2009-01-07 00:00:00 NA NA NA
4 2009-01-08 00:00:00 NA NA NA
5 2009-01-09 00:00:00 NA NA NA
6 2009-01-12 00:00:00 NA NA NA
7 2009-01-13 00:00:00 NA NA NA
8 2009-01-14 00:00:00 NA NA NA
9 2009-01-15 00:00:00 NA NA NA
10 2009-01-16 00:00:00 NA NA NA
# ... with 3,328 more rows
# i Use `print(n = ...)` to see more rows
The problem is that these three columns A,B,C contain numeric values but after some 3 thousand rows.
but trying to convert them into numeric i did :
data%>%
dplyr::mutate(date = as.Date(Dates))%>%
dplyr::select(-Dates)%>%
dplyr::relocate(date,.before="A")%>%
dplyr::mutate_if(is.logical, as.numeric)%>%
tidyr::pivot_longer(!date,names_to = "var", values_to = "y")%>%
dplyr::group_by(var)%>%
dplyr::arrange(var)%>%
tidyr::drop_na()
but the problem remains :
date var y
<date> <chr> <dbl>
1 2021-11-30 A 1
2 2021-12-01 A 1
3 2021-12-02 A 1
4 2021-12-03 A 1
5 2021-12-06 A 1
6 2021-12-07 A 1
7 2021-12-08 A 1
8 2021-12-09 A 1
9 2021-12-10 A 1
10 2021-12-13 A 1
# ... with 189 more rows
any help ?
CodePudding user response:
Summing up from comments:
it's usually easier to fix conversation errors closer to original source as possible.
read_xlsx
tries to guess column types by checking first guess_max
rows, guess_max
being a read_xlsx
parameter with a default value of min(1000, n_max)
. If read_xlsx
gets columns types wrong because those cols are filled with NA
for the first 1000 rows, just increasing guess_max
parameter might be a viable solution:
readxl::read_xlsx("file.xlsx", guess_max = 5000)
Though for a simple 4-column dataset one shouldn't need more than defining correct column types manually:
readxl::read_xlsx("file.xlsx", col_types = c("date", "numeric", "numeric", "numeric"))
If NA
values are only at the beginning of some columns, changing sorting order in Excel itself and moving NA
s from top before importing the file into R should also work.